postgresql expression wher id is in array - sql

I have an array of ids such as:
a = [13, 51, 99, 143, 225, 235, 873]
What is the most efficient way of getting the records where the id is in the array.
I don't really want to use or such as WHERE id = 13 || 92 , as the array could be extremely long. I've tried this:
select * from authors where id <# [11, 8, 51, 62, 7];
but that's not correct.
Thanks

Use any
select *
from authors
where id = any (array[11, 8, 51, 62, 7]);

Related

How to work with symbols, numbers, and letters in TensorFlow?

I'm working on my first Tensorflow model and when I was training the dataset, my accuracy dropped to 25% from around 60% when using sci-kit. A friend told me it might have to do with some of the data, for example, "781C376B-E380-C052-448B-B4AB6F3D". How do I deal with symbols (dashes here), numbers, and letters in my data when running my models?
Currently I am looking into text vectorization so it could read my data easier.
You can you tf.strings.unicode_decode() which converts an encoded string scalar to a vector of code points. It provides unique number for each character in the string.
For example:
# A batch of Unicode strings, each represented as a UTF8-encoded string.
batch_utf8 = [s.encode('UTF-8') for s in
[u'781C376B-E380-C052-448B-B4AB6F3D']]
batch_chars_ragged = tf.strings.unicode_decode(batch_utf8,
input_encoding='UTF-8')
for sentence_chars in batch_chars_ragged.to_list():
print(sentence_chars)
output:[55, 56, 49, 67, 51, 55, 54, 66, 45, 69, 51, 56, 48, 45, 67, 48, 53, 50, 45, 52, 52, 56, 66, 45, 66, 52, 65, 66, 54, 70, 51, 68]
For more details please refer to this document. Thank You.

Compute moving average on a dynamic rolling window

OK, need some help here! I have the following dataframe.
df2 = {'Value': [123, 126, 120, 121, 123, 126, 120, 121, 123, 126],
'Look-back': [2, 3, 4, 5, 3, 6, 2, 4, 2, 1]}
df2 = pd.DataFrame(df2)
df2
I'd like to add a third row that shows the simple moving average of the 'Value' column with the rolling look-back period of the 'Look-back' column. My thought was to do this.
df2['Average'] = df2['Value'].rolling(df['Look-back']).mean()
Of course this doesn't work because the rolling() function needs an integer key value and I'm supplying a series.
How do I get what I'm after here?

How eliminate rows in a dataframe with common values with other dataframe? Rstudio

I need to eliminate the rows in a dataframe that have at a column common values with the same column in a second dataframe
The columns the code have to take into account contain IDs of subjetcs, while the rest contain data refering to those subjects.
Example of dataframes (Rstudio)
df1<-data.frame(ID=c(13, 16, 25, 36, 25, 17, 50, 63, 61, 34, 65, 17), AnyData=round(runif(12, 1, 5)))
df2<-data.frame(ID=c(89, 57, 13, 17, 18, 21, 51, 50, 72, 84), AnyData=round(runif(10, 1, 5)))
I have tried two functions
df1<- filter(df1, ID!=df2[ID])
df1<- df1[-c(which(df1[ID]==df2[ID]))]
The result should be:
df1 <- data.frame(ID=c(16, 25, 36, 25, 63, 61, 34, 656), AnyData=(...)
AnyData depends on the values asigned with ruinf, so it will vary, but the value must be the same as in the original df1.
What you need is an anti_join():
library(dplyr)
df1 %>%
anti_join(df2, by = "ID")

Percentile of every column and row in a dataframe

I have a csv that looks like the image below. I want to calculate the percentile(10,50,90) of each row starting from B2 to X2 and adding that final percentile in a new column. Essentially, I want to find the 10th percetile of the average(std, cv, sp_tim.....) value over the entire period of record available.
I have created the following code line to read it in python as a dataframe format so far.
da = pd.read_csv('Project/11433300_annual_flow_matrix.csv', index_col=0, parse_dates=True)
If I have understood your question correctly then below code might be helpful for you:
I have Used some Dummy data, and given similar kind of treatment on it which you are looking for
aq = [1, 2, 2, 3, 3, 4, 4, 5, 7, 8, 10, 11]
aw = [91, 25, 13, 53, 95, 94, 75, 35, 57, 88, 111, 12]
df = pd.DataFrame({'aq': aq, 'aw': aw})
n = df.shape[0]
p = 0.1 #for 10th percentile
position = np.ceil(n*p)
position = int(position)
df.iloc[position,]
Kindly have a look and let me know if this is works for you.

matplotlib plot data with nans

I'm surprised how few are the posts relating to this problem. Anyway...
here it is:
I have csv data files containing X values in the first column, and several Y values columns thereafter. But for a given X value not all Y series have a corresponding value. Here is an example:
0, 16, 96, 99
10, 88, 45, 85
20, 85, 61, 10
30, 30, --, 45
40, 82, 28, 82
50, 23, 9, 61
60, 40, 77, 0
70, 26, 21, --
80, --, 58, 99
90, 1, 14, 30
when this csv data is loaded with numpy.genfromtxt, the '--' strings are taken as nan which is good. But when plotting, the plots are interrupted with blanks where there is a nan. Is there an option when a nan appears to make pyplot.plot() ignoring both the nan and the corresponding X value?
Not sure if matplotlib has such functionality built in, but you could home-brew it doing the following:
idx = ~numpy.isnan(Y)
pyplot.plot(X[idx], Y[idx])
Look at this post
As proposed in my answer there, I'd recommend using np.isfinite instead of np.isnan. There might be other reasons for your plot to have discontinuities, e.f., inf