I have an array of ids such as:
a = [13, 51, 99, 143, 225, 235, 873]
What is the most efficient way of getting the records where the id is in the array.
I don't really want to use or such as WHERE id = 13 || 92 , as the array could be extremely long. I've tried this:
select * from authors where id <# [11, 8, 51, 62, 7];
but that's not correct.
Thanks
Use any
select *
from authors
where id = any (array[11, 8, 51, 62, 7]);
Related
I'm working on my first Tensorflow model and when I was training the dataset, my accuracy dropped to 25% from around 60% when using sci-kit. A friend told me it might have to do with some of the data, for example, "781C376B-E380-C052-448B-B4AB6F3D". How do I deal with symbols (dashes here), numbers, and letters in my data when running my models?
Currently I am looking into text vectorization so it could read my data easier.
You can you tf.strings.unicode_decode() which converts an encoded string scalar to a vector of code points. It provides unique number for each character in the string.
For example:
# A batch of Unicode strings, each represented as a UTF8-encoded string.
batch_utf8 = [s.encode('UTF-8') for s in
[u'781C376B-E380-C052-448B-B4AB6F3D']]
batch_chars_ragged = tf.strings.unicode_decode(batch_utf8,
input_encoding='UTF-8')
for sentence_chars in batch_chars_ragged.to_list():
print(sentence_chars)
output:[55, 56, 49, 67, 51, 55, 54, 66, 45, 69, 51, 56, 48, 45, 67, 48, 53, 50, 45, 52, 52, 56, 66, 45, 66, 52, 65, 66, 54, 70, 51, 68]
For more details please refer to this document. Thank You.
OK, need some help here! I have the following dataframe.
df2 = {'Value': [123, 126, 120, 121, 123, 126, 120, 121, 123, 126],
'Look-back': [2, 3, 4, 5, 3, 6, 2, 4, 2, 1]}
df2 = pd.DataFrame(df2)
df2
I'd like to add a third row that shows the simple moving average of the 'Value' column with the rolling look-back period of the 'Look-back' column. My thought was to do this.
df2['Average'] = df2['Value'].rolling(df['Look-back']).mean()
Of course this doesn't work because the rolling() function needs an integer key value and I'm supplying a series.
How do I get what I'm after here?
I need to eliminate the rows in a dataframe that have at a column common values with the same column in a second dataframe
The columns the code have to take into account contain IDs of subjetcs, while the rest contain data refering to those subjects.
Example of dataframes (Rstudio)
df1<-data.frame(ID=c(13, 16, 25, 36, 25, 17, 50, 63, 61, 34, 65, 17), AnyData=round(runif(12, 1, 5)))
df2<-data.frame(ID=c(89, 57, 13, 17, 18, 21, 51, 50, 72, 84), AnyData=round(runif(10, 1, 5)))
I have tried two functions
df1<- filter(df1, ID!=df2[ID])
df1<- df1[-c(which(df1[ID]==df2[ID]))]
The result should be:
df1 <- data.frame(ID=c(16, 25, 36, 25, 63, 61, 34, 656), AnyData=(...)
AnyData depends on the values asigned with ruinf, so it will vary, but the value must be the same as in the original df1.
What you need is an anti_join():
library(dplyr)
df1 %>%
anti_join(df2, by = "ID")
I have a csv that looks like the image below. I want to calculate the percentile(10,50,90) of each row starting from B2 to X2 and adding that final percentile in a new column. Essentially, I want to find the 10th percetile of the average(std, cv, sp_tim.....) value over the entire period of record available.
I have created the following code line to read it in python as a dataframe format so far.
da = pd.read_csv('Project/11433300_annual_flow_matrix.csv', index_col=0, parse_dates=True)
If I have understood your question correctly then below code might be helpful for you:
I have Used some Dummy data, and given similar kind of treatment on it which you are looking for
aq = [1, 2, 2, 3, 3, 4, 4, 5, 7, 8, 10, 11]
aw = [91, 25, 13, 53, 95, 94, 75, 35, 57, 88, 111, 12]
df = pd.DataFrame({'aq': aq, 'aw': aw})
n = df.shape[0]
p = 0.1 #for 10th percentile
position = np.ceil(n*p)
position = int(position)
df.iloc[position,]
Kindly have a look and let me know if this is works for you.
I'm surprised how few are the posts relating to this problem. Anyway...
here it is:
I have csv data files containing X values in the first column, and several Y values columns thereafter. But for a given X value not all Y series have a corresponding value. Here is an example:
0, 16, 96, 99
10, 88, 45, 85
20, 85, 61, 10
30, 30, --, 45
40, 82, 28, 82
50, 23, 9, 61
60, 40, 77, 0
70, 26, 21, --
80, --, 58, 99
90, 1, 14, 30
when this csv data is loaded with numpy.genfromtxt, the '--' strings are taken as nan which is good. But when plotting, the plots are interrupted with blanks where there is a nan. Is there an option when a nan appears to make pyplot.plot() ignoring both the nan and the corresponding X value?
Not sure if matplotlib has such functionality built in, but you could home-brew it doing the following:
idx = ~numpy.isnan(Y)
pyplot.plot(X[idx], Y[idx])
Look at this post
As proposed in my answer there, I'd recommend using np.isfinite instead of np.isnan. There might be other reasons for your plot to have discontinuities, e.f., inf