Removed the last element from a json[]? - sql

I have a json[] array (_result_group) in PostgreSQL 9.4, and I want to remove its last json element (_current). I prepared with:
_result_group := (SELECT array_append(_result_group,_current));
And tried to remove with:
SELECT _result_group[1:array_length(_result_group,1) -1] INTO _result_group;
But it didn't work.
How to do this?

To remove the last element from any array (including json[]) with the means of Postgres 9.4, obviously within plpgsql code:
_result_group := _result_group[1:cardinality(_result_group)-1];
Assuming a 1-dimensional array with default subscripts starting with 1.
You get an empty array for empty array input and null for null.
According to the manual, cardinality() ...
returns the total number of elements in the array, or 0 if the array is empty
Then just take the array slice from 1 to cardinality -1.
Then again, your attempt should work as well:
SELECT _result_group[1:array_length(_result_group,1) -1] INTO _result_group;
For non-standard array subscripts see:
Normalize array subscripts for 1-dimensional array so they start with 1

Related

Inconsistent behavior when slicing a 2d array in PostgreSQL

Let's say I have a 2d array:
# SELECT ARRAY[ARRAY[1,2], ARRAY[3,4]];
array
---------------
{{1,2},{3,4}}
(1 row)
Now, if I want to get the first element of each inner array, adding (...)[:][1] will do the trick:
# SELECT (ARRAY[ARRAY[1,2], ARRAY[3,4]])[:][1];
array
-----------
{{1},{3}}
(1 row)
BUT: If I want to obtain the second element of each inner array, I have to opt for adding (...)[:][2:2], as (...)[:][2] would return the untouched array again
# SELECT (ARRAY[ARRAY[1,2], ARRAY[3,4]])[:][2];
array
---------------
{{1,2},{3,4}}
(1 row)
# SELECT (ARRAY[ARRAY[1,2], ARRAY[3,4]])[:][2:2];
array
-----------
{{2},{4}}
(1 row)
What is the reason for this inconsistent behavior?
I think the documentation explains this pretty well:
If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified.
That is, when you are using slices, Postgres expects all dimensions to be slices. Those that are not are defaulted to 1:n.

AttributeError: 'int' object has no attribute 'count' while using itertuples() method with dataframes

I am trying to iterate over rows in a Pandas Dataframe using the itertuples()-method, which works quite fine for my case. Now i want to check if a specific value ('x') is in a specific tuple. I used the count() method for that, as i need to use the number of occurences of x later.
The weird part is, for some Tuples that works just fine (i.e. in my case (namedtuple[7].count('x')) + (namedtuple[8].count('x')) ), but for some (i.e. namedtuple[9].count('x')) i get an AttributeError: 'int' object has no attribute 'count'
Would appreciate your help very much!
Apparently, some columns of your DataFrame are of object type (actually a string)
and some of them are of int type (more generally - numbers).
To count occurrences of x in each row, you should:
Apply a function to each row which:
checks whether the type of the current element is str,
if it is, return count('x'),
if not, return 0 (don't attempt to look for x in a number).
So far this function returns a Series, with a number of x in each column
(separately), so to compute the total for the whole row, this Series should
be summed.
Example of working code:
Test DataFrame:
C1 C2 C3
0 axxv bxy 10
1 vx cy 20
2 vv vx 30
Code:
for ind, row in df.iterrows():
print(ind, row.apply(lambda it:
it.count('x') if type(it).__name__ == 'str' else 0).sum())
(in my opinion, iterrows is more convenient here).
The result is:
0 3
1 1
2 1
So as you can see, it is possible to count occurrences of x,
even when some columns are not strings.

Iterate on OrientRecord object

I am trying to increment twice in a loop and print the OrientRecord Objects using Python.
Following is my code -
for items in iteritems:
x = items.oRecordData
print (x['attribute1'])
y=(next(items)).oRecordData #Here is the error
print (y['attribute2'])
Here, iteritems is a list of OrientRecord objects. I have to print attributes of two consecutive objects in one loop.
I am getting the following error -
TypeError: 'OrientRecord' object is not an iterator
Try using a different approach to it:
for i in range(0,len(iteritems),2):
x = iteritems[i].oRecordData
print (x['attribute1'])
y = iteritems[i+1].oRecordData
print (y['attribute2'])
The range() function will start from 0 and iterate by 2 steps.
However, this will work properly only if the total amount (range) of records is an even number, otherwise it'll return:
IndexError: list index out of range
I hope this helps.

How can I Select nth element from an array's 2nd dimension?

I have a 2-dimensional int array, and I'd like to get the 2nd element from every array in the 2nd dimension. So for example, I'd like to get 2,4, and 6 from the array literal '{{1,2},{3,4},{5,6}'. Is this possible? I've searched the docs but I haven't found anything that can do what I want.
unnest(arr[:][2:2]) will give you a table expression for what you want (where arr is the name of your array column)
If you want to get a 1 dimensional array of those elements, you can use array(select * from unnest(arr[:][2:2])) (because arr[:][2:2] is still a 2 dimensional one).
http://rextester.com/VLOJ18858

Numpy maximum(arrays)--how to determine the array each max value came from

I have numpy arrays representing July temperature for each year since 1950.
I can use the numpy.maximum(temp1950,temp1951,temp1952,..temp2014)
to determine the maximum July temperature at each cell.
I need the maximum for each cell..the numpy.maximum() works for only 2 arrays
How do I determine the year that each max value came from?
Also the numpy.maximum(array1,array2) works comparing only two arrays.
Thanks to Praveen, the following works fine:
array1 = numpy.array( ([1,2],[3,4]) )
array2 = numpy.array( ([3,4],[1,2]) )
array3 = numpy.array( ([9,1],[1,9]) )
all_arrays = numpy.dstack((array1,array2,array3))
#maxvalues = numpy.maximum(all_arrays)#will not work
all_arrays.max(axis=2) #this returns the max from each cell location
max_indexes = numpy.argmax(all_arrays,axis=2)#this returns correct indexes
The answer is argmax, except that you need to do this along the required axis. If you have 65 years' worth of temperatures, it doesn't make sense to keep them in separate arrays.
Instead, put them all into a single 2D dimensional array using something like np.vstack and then take the argmax over rows.
alltemps = np.vstack((temp1950, temp1951, ..., temp2014))
maxindexes = np.argmax(alltemps, axis=0)
If your temperature arrays are already 2D for some reason, then you can use np.dstack to stack in depth instead. Then you'll have to take argmax over axis=2.
For the specific example in your question, you're looking for something like:
t = np.dstack((array1, array2)) # Note the double parantheses. You need to pass
# a tuple to the function
maxindexes = np.argmax(t, axis=2)
PS: If you are getting the data out of a file, I suggest putting them in a single array to start with. It gets hard to handle 65 variable names.
You need to use Numpy's argmax
It would give you the index of the largest element in the array, which you can map to the year.