After an AI file analysis across tens of thousands of audio files I end up with this kind of data structure in a Postgres DB:
id | name | tag_1 | tag_2 | tag_3 | tag_4 | tag_5
1 | first song | rock | pop | 80s | female singer | classic rock
2 | second song | pop | rock | jazz | electronic | new wave
3 | third song | rock | funk | rnb | 80s | rnb
Tag positions are really important: the more "to the left", the more prominent it is in the song. The number of tags is also finite (50 tags) and the AI always returns 5 of them for every song, no null values expected.
On the other hand, this is what I have to query:
{"rock" => 15, "pop" => 10, "soul" => 3}
The key is a Tag name and the value an arbitrary weight. Numbers of entries could be random from 1 to 50.
According to the example dataset, in this case it should return [1, 3, 2]
I'm also open for data restructuring if it could be easier to achieve using raw concatenated strings but... is it something doable using Postgres (tsvectors?) or do I really have to use something like Elasticsearch for this?
After a lot of trials and errors this is what I ended up with, using only Postgres:
Turn all data set to integers, so it moves on to something like this (I also added columns to match the real data set more closely) :
id | bpm | tag_1 | tag_2 | tag_3 | tag_4 | tag_5
1 | 114 | 1 | 2 | 3 | 4 | 5
2 | 102 | 2 | 1 | 6 | 7 | 8
3 | 110 | 1 | 9 | 10 | 3 | 12
Store requests in an array as strings (note that I sanitized those requests before with some kind of "request builder"):
requests = [
"bpm BETWEEN 110 AND 124
AND tag_1 = 1
AND tag_2 = 2
AND tag_3 = 3
AND tag_4 = 4
AND tag_5 = 5",
"bpm BETWEEN 110 AND 124
AND tag_1 = 1
AND tag_2 = 2
AND tag_3 = 3
AND tag_4 = 4
AND tag_5 IN (1, 3, 5)",
"bpm BETWEEN 110 AND 124
AND tag_1 = 1
AND tag_2 = 2
AND tag_3 = 3
AND tag_4 IN (1, 3, 5),
AND tag_5 IN (1, 3, 5)",
....
]
Simply loop in the requests array, from the most precise to the most approximate one:
# Ruby / ActiveRecord example
track_ids = []
requests.each do |request|
track_ids += Track.where([
"(#{request})
AND tracks.id NOT IN ?", track_ids
]).pluck(:id)
break if track_ids.length > 200
end
... and done! All my songs are ordered by similarity, the closest match at the top, and the more to the bottom, the more approximate they get. Since everything is about integers, it's pretty fast (fast enough on a 100K rows dataset), and the output looks like pure magic. Bonus point: it is still easily tweakable and maintainable by the whole team.
I do understand that's rough, so I'm open to any more efficient way to do the same thing, even if something else is needed in the stack (ES ?), but so far: it's a simple solution that just works.
In class, we were given this static method which we are asked to test. The method is supposed to (but won’t always) return the same integer value that was given as input.
static int identity(int x) {
if (20 <= x && x <= 30) {
x /= 2;
}
if (5 <= x && x <= 15) {
x *= 2;
}
return x;
}
The question asks us to create the minimum set of tests that has “complete path coverage”. Since there are two conditional statements, you would expect to generate 2^n tests, which is 4 in this instance. However, it is impossible to create a test where the first condition is true and the second condition is false. Does this mean that the minimum number of tests that has “complete path coverage” is 3?
From the POV of a tester, I would look at the edges of your ranges to determine your test coverage. You don't want to just make sure that the if statements are executed - you want to check the boundaries of the ranges as well as one input within the range.
Given this, I would test the following inputs and expect the following outputs:
| input | output |
| 19 | 19 | (just outside first boundary minimum)
| 20 | 20 | (just inside first boundary minimum)
| 24 | 24 | (value within the range)
| 30 | 30 | (just inside first boundary maximum)
| 31 | 31 | (just outside first boundary maximum)
| 4 | 4 | (just outside second boundary minimum)
| 5 | 10 | (just inside second boundary minimum)
| 10 | 20 | (value within the range)
| 15 | 30 | (just inside second boundary maximum)
| 16 | 16 | (just outside second boundary maximum)
If you want to read more, google boundary testing.
I've set up the Apache (2.4) load-balancer which is working okay. To monitor its performance, I enabled the balancer-manager handler, which shows the status of the balancers.
I noticed a "Load" column, which was not present in version 2.2, with a value that may be negative, but I don't understand its meaning nor I was able to find documentation relative to this.
Can anyone explain the meaning of that value or point me to the right documentation?
I now understood, how the calculation of "Load" works. Here is a I think more simpler example than on the apache documents page.
Let's say we have 3 worker and a configured load factor of 1.
1) Start
a | b | c
--+---+---
0 | 0 | 0
add the load factor of 1 to all workers
a | b | c
--+---+---
1 | 1 | 1
now select the one with highest value --> a and decrease by the sum of the factor of all (=3) - this is the selected worker
a | b | c
---+---+---
-2 | 1 | 1
2) next round, add again 1 to all
a | b | c
---+---+---
-1 | 2 | 2
now select the one with highest value --> b and decrease by the sum of the factor of all (=3) - this is the selected worker
a | b | c
---+----+----
-1 | -1 | 2
3) next round, add again 1
a | b | c
---+----+----
0 | 0 | 3
now select the one with highest value --> c and decrease by the sum of the factor of all (=3) - this is the selected worker
a | b | c
---+----+----
0 | 0 | 0
startover again :)
I hope this helps others.
The Load value is populated by lbstatus based on this line of code:
ap_rprintf(r, "<td>%d</td><td>", worker->s->lbstatus);
in https://svn.apache.org/viewvc/httpd/httpd/trunk/modules/proxy/mod_proxy_balancer.c?view=markup#l1767 (line might changed when the code modified)
Since your method is by request, lbstatus is specified by mod_lbmethod_byrequests which define:
lbstatus is how urgent this worker has to work to fulfill its quota of
work.
Details on the algorithm can be found here: https://httpd.apache.org/docs/2.4/mod/mod_lbmethod_byrequests.html
i too want to know to description for others column like BUSY, ELECTED etc.. my LB has BUSY over 100 already.. i though BUSY should not exceed 100 ( as in 100% server busyness or something )
How to set manually slipt long rows size on Octave's Terminal Output?
I am using Octave through Sublime Text output build panel, and octave cannot recognize correctly how many rows it should use to split/to fill up the screen.
Example, It is currently filling the screen like this:
octave:13> rand (2,10)
ans =
Columns 1 through 6:
0.75883 0.93290 0.40064 0.43818 0.94958 0.16467
0.75697 0.51942 0.40031 0.61784 0.92309 0.40201
Columns 7 through 10:
0.90174 0.11854 0.72313 0.73326
0.44672 0.94303 0.56564 0.82150
But I want to set 10 columns (Columns 1 through 10) instead of Columns 1 through 6.
If I disable the split_long_rows, never splits.
Query or set the internal variable that controls whether rows of a
matrix may be split when displayed to a terminal window.
If the rows are split, Octave will display the matrix in a series of
smaller pieces, each of which can fit within the limits of your
terminal width and each set of rows is labeled so that you can easily
see which columns are currently being displayed.
https://www.gnu.org/software/octave/doc/v4.0.1/Matrices.html#XREFsplit_005flong_005frows
You cannot to split them like that. The Octave output is just a simple and fast way to debug your program. To print things beautifully as you want to, just to create a function for it and it to print them.
This is a similar example, where a table is printed:
...
for i = 2 : 7
...
% https://www.gnu.org/software/octave/doc/v4.0.1/Basic-Usage-of-Cell-Arrays.html
results(end+1).vector = { m, gaussLegendreIntegral________, gaussLegendreIntegralErroExato___ };
end
printf( "%20s | %30s | %30s\n", "m", "Gm", "Erro Exato Gm = |Gm - Ie |" )
printf( "%20s | %30s | %30s\n", "--------------------", "------------------------------", "------------------------------" )
numberToStringPrecision = 15;
for i = 1 : numel( results )
# https://www.gnu.org/software/octave/doc/v4.0.0/Processing-Data-in-Cell-Arrays.html
# https://www.gnu.org/software/octave/doc/v4.0.1/Converting-Numerical-Data-to-Strings.html#XREFnum2str
printf( "%20s | ", num2str( cell2mat( results(i).vector(1) ), numberToStringPrecision ) )
printf( "%30s | ", num2str( cell2mat( results(i).vector(2) ), numberToStringPrecision ) )
printf( "%30s\n" , num2str( cell2mat( results(i).vector(3) ), numberToStringPrecision ) )
end
It would generate a output like this:
m | Gm | Erro Exato Gm = |Gm - Ie |
-------------------- | ------------------------------ | ------------------------------
2 | -0.895879734614027 | 0.104120265385973
3 | -0.947672383858322 | 0.0523276161416784
4 | -0.968535977854582 | 0.0314640221454183
5 | -0.979000992287376 | 0.0209990077126242
6 | -0.984991210262343 | 0.0150087897376568
7 | -0.988738923004894 | 0.0112610769951058
Given: {{1,"a"},{2,"b"},{3,"c"}}
Desired:
foo | bar
-----+------
1 | a
2 | b
3 | c
You can get the intended result with the following query; however, it'd be better to have something that scales with the size of the array.
SELECT arr[subscript][1] as foo, arr[subscript][2] as bar
FROM ( select generate_subscripts(arr,1) as subscript, arr
from (select '{{1,"a"},{2,"b"},{3,"c"}}'::text[][] as arr) input
) sub;
This works:
select key as foo, value as bar
from json_each_text(
json_object('{{1,"a"},{2,"b"},{3,"c"}}')
);
Result:
foo | bar
-----+------
1 | a
2 | b
3 | c
Docs
Not sure what exactly you mean saying "it'd be better to have something that scales with the size of the array". Of course you can not have extra columns added to resultset as the inner array size grows, because postgresql must know exact colunms of a query before its execution (so before it begins to read the string).
But I would like to propose converting the string into normal relational representation of matrix:
select i, j, arr[i][j] a_i_j from (
select i, generate_subscripts(arr,2) as j, arr from (
select generate_subscripts(arr,1) as i, arr
from (select ('{{1,"a",11},{2,"b",22},{3,"c",33},{4,"d",44}}'::text[][]) arr) input
) sub_i
) sub_j
Which gives:
i | j | a_i_j
--+---+------
1 | 1 | 1
1 | 2 | a
1 | 3 | 11
2 | 1 | 2
2 | 2 | b
2 | 3 | 22
3 | 1 | 3
3 | 2 | c
3 | 3 | 33
4 | 1 | 4
4 | 2 | d
4 | 3 | 44
Such a result may be rather usable in further data processing, I think.
Of course, such a query can handle only array with predefined number of dimensions, but all array sizes for all of its dimensions can be changed without rewriting the query, so this is a bit more flexible approach.
ADDITION: Yes, using with recursive one can build resembling query, capable of handling array with arbitrary dimensions. None the less, there is no way to overcome the limitation coming from relational data model - exact set of columns must be defined at query parse time, and no way to delay this until execution time. So, we are forced to store all indices in one column, using another array.
Here is the query that extracts all elements from arbitrary multi-dimensional array along with their zero-based indices (stored in another one-dimensional array):
with recursive extract_index(k,idx,elem,arr,n) as (
select (row_number() over())-1 k, idx, elem, arr, n from (
select array[]::bigint[] idx, unnest(arr) elem, arr, array_ndims(arr) n
from ( select '{{{1,"a"},{11,111}},{{2,"b"},{22,222}},{{3,"c"},{33,333}},{{4,"d"},{44,444}}}'::text[] arr ) input
) plain_indexed
union all
select k/array_length(arr,n)::bigint k, array_prepend(k%array_length(arr,2),idx) idx, elem, arr, n-1 n
from extract_index
where n!=1
)
select array_prepend(k,idx) idx, elem from extract_index where n=1
Which gives:
idx | elem
--------+-----
{0,0,0} | 1
{0,0,1} | a
{0,1,0} | 11
{0,1,1} | 111
{1,0,0} | 2
{1,0,1} | b
{1,1,0} | 22
{1,1,1} | 222
{2,0,0} | 3
{2,0,1} | c
{2,1,0} | 33
{2,1,1} | 333
{3,0,0} | 4
{3,0,1} | d
{3,1,0} | 44
{3,1,1} | 444
Formally, this seems to prove the concept, but I wonder what a real practical use one could make out of it :)