If I have a dataframe of the form:
tag element_id
1 12
1 13
1 15
2 12
2 13
2 19
3 12
3 15
3 22
how can I compute the overlaps of the tags in terms of the element_id ? The result I guess should be an overlap matrix of the form:
1 2 3
1 X 2 2
2 2 X 1
3 2 1 X
where I put X on the diagonal since the overlap of a tag with itself is not relevant and where the numbers in the matrix represent the total element_ids that the two tags share.
My attempts:
You can try and use a for loop like :
for item in df.itertuples():
element_lst += [item.element_id]
element_tag = item.tag
# then intersect the element_list row by row.
# This is extremely costly for large datasets
The second thing I was thinking about was to use df.groupby('tag') and try to somehow intersect on element_id, but it is not clear to me how I can do that with grouped data.
merge + crosstab
# Find element overlap, remove same tag matches
res = df.merge(df, on='element_id').query('tag_x != tag_y')
pd.crosstab(res.tag_x, res.tag_y)
Output:
tag_y 1 2 3
tag_x
1 0 2 2
2 2 0 1
3 2 1 0
I have some data that I want to display in scatter chart. I have the following two dimensions:
Dimension1: This is each record in the table - say unique id for each row. So the number of dots should be equal to number of records.
Dimension2: This is a combination of 2 columns. tp and vc. Colors of each dot is based on these 2 columns.
tp vc
1 a 1
2 b 2
3 c 1
So there will be dots of 3 colors based on the above tp and vc combinations. Then there are 3 expressions representing X and Y and Size of dot. I am not sure how to configure the dimensions to achieve the goal.
Thanks
You will need a calculated dimmension which is the concatanation expression defined as =tp & vc in your case.
Then this will be your single dimmension. Then your x,y,size expressions make up the remaining requirements for this chart.
This will give you three colors, one for each unique record combination and they will be labled a1 and b2 and c1.
id tp vc x y size
1 | a | 1 | 3 | 5 | 7
2 | b | 2 | 1 | 2 | 10
3 | c | 1 | 9 | 5 | 5
I have to manipulate range of cells in excel using VB. Can I do it in the following manner ?
Range("a1:b5")=[1 2 3 4 5 6 7 8 9 0]
I don't think so, I've never seen that syntax. One thing you can do is something like:
For Row = 1 To 5
Range("a" + CStr(Row)).Value = Row
Range("b" + CStr(Row)).Value = (Row + 5) Mod 10
Next Row
assuming that you want it set up thus:
A B
+------
1 | 1 6
2 | 2 7
3 | 3 8
4 | 4 9
5 | 5 0
You may need to use Mid(CStr(Row),2) - I can't remember off the top of my head if Cstr gives you a leading space for non-negative numbers.
Given: {{1,"a"},{2,"b"},{3,"c"}}
Desired:
foo | bar
-----+------
1 | a
2 | b
3 | c
You can get the intended result with the following query; however, it'd be better to have something that scales with the size of the array.
SELECT arr[subscript][1] as foo, arr[subscript][2] as bar
FROM ( select generate_subscripts(arr,1) as subscript, arr
from (select '{{1,"a"},{2,"b"},{3,"c"}}'::text[][] as arr) input
) sub;
This works:
select key as foo, value as bar
from json_each_text(
json_object('{{1,"a"},{2,"b"},{3,"c"}}')
);
Result:
foo | bar
-----+------
1 | a
2 | b
3 | c
Docs
Not sure what exactly you mean saying "it'd be better to have something that scales with the size of the array". Of course you can not have extra columns added to resultset as the inner array size grows, because postgresql must know exact colunms of a query before its execution (so before it begins to read the string).
But I would like to propose converting the string into normal relational representation of matrix:
select i, j, arr[i][j] a_i_j from (
select i, generate_subscripts(arr,2) as j, arr from (
select generate_subscripts(arr,1) as i, arr
from (select ('{{1,"a",11},{2,"b",22},{3,"c",33},{4,"d",44}}'::text[][]) arr) input
) sub_i
) sub_j
Which gives:
i | j | a_i_j
--+---+------
1 | 1 | 1
1 | 2 | a
1 | 3 | 11
2 | 1 | 2
2 | 2 | b
2 | 3 | 22
3 | 1 | 3
3 | 2 | c
3 | 3 | 33
4 | 1 | 4
4 | 2 | d
4 | 3 | 44
Such a result may be rather usable in further data processing, I think.
Of course, such a query can handle only array with predefined number of dimensions, but all array sizes for all of its dimensions can be changed without rewriting the query, so this is a bit more flexible approach.
ADDITION: Yes, using with recursive one can build resembling query, capable of handling array with arbitrary dimensions. None the less, there is no way to overcome the limitation coming from relational data model - exact set of columns must be defined at query parse time, and no way to delay this until execution time. So, we are forced to store all indices in one column, using another array.
Here is the query that extracts all elements from arbitrary multi-dimensional array along with their zero-based indices (stored in another one-dimensional array):
with recursive extract_index(k,idx,elem,arr,n) as (
select (row_number() over())-1 k, idx, elem, arr, n from (
select array[]::bigint[] idx, unnest(arr) elem, arr, array_ndims(arr) n
from ( select '{{{1,"a"},{11,111}},{{2,"b"},{22,222}},{{3,"c"},{33,333}},{{4,"d"},{44,444}}}'::text[] arr ) input
) plain_indexed
union all
select k/array_length(arr,n)::bigint k, array_prepend(k%array_length(arr,2),idx) idx, elem, arr, n-1 n
from extract_index
where n!=1
)
select array_prepend(k,idx) idx, elem from extract_index where n=1
Which gives:
idx | elem
--------+-----
{0,0,0} | 1
{0,0,1} | a
{0,1,0} | 11
{0,1,1} | 111
{1,0,0} | 2
{1,0,1} | b
{1,1,0} | 22
{1,1,1} | 222
{2,0,0} | 3
{2,0,1} | c
{2,1,0} | 33
{2,1,1} | 333
{3,0,0} | 4
{3,0,1} | d
{3,1,0} | 44
{3,1,1} | 444
Formally, this seems to prove the concept, but I wonder what a real practical use one could make out of it :)
Does there exist a Postgres Aggregator such that, when used on the following table:
id | value
----+-----------
1 | 1
2 | 1
3 | 2
4 | 2
5 | 3
6 | 3
7 | 3
8 | 4
9 | 4
10 | 5
in a query such as:
select agg_function(4,value) from mytable where id>5
will return
agg_function
--------------
t
(a boolean true result) because a row or rows with value=4 were selected?
In other words, one argument specifies the value you are looking for, the other argument takes the column specifier, and it returns true if the column value was equal to the specified value for one or more rows?
I have successfully created an aggregate to do just that, but I'm wondering if I have just re-created the wheel...
select sum(case when value = 4 then 1 else 0 end) > 0
from mytable
where id > 5