Oracle SQL error: maximum number of expressions - sql

Could you please help me regarding that issue getting error in Oracle SQL
ORA-01795 maximum number of expressions in a list is 1000
I'm passing value like
and test in (1, 2, 3.....1000)

Try to split your query with multiple in clauses like below
SELECT *
FROM table_name
WHERE test IN (1,2,3,....500)
OR test IN (501, 502, ......1000);

You can try workarounds:
Split single in into several ones:
select ...
from ...
where test in (1, ..., 999) or
test in (1000, ..., 1999) or
...
test in (9000, ..., 9999)
Put values into a (temporary?) table, say TestTable:
select ...
from ...
where test in (select TestField
from TestTable)
Edit: As I can see, the main difficulty is to build such a query. Let's implement it in C#. We are given a collection of ids:
// Test case ids are in [1..43] range
IEnumerable<int> Ids = Enumerable.Range(1, 43);
// Test case: 7, in actual Oracle query you, probably set it to 100 or 1000
int chunkSize = 7;
string fieldName = "test";
string filterText = string.Join(" or " + Environment.NewLine, Ids
.Select((value, index) => new {
value = value,
index = index
})
.GroupBy(item => item.index / chunkSize)
.Select(chunk =>
$"{fieldName} in ({string.Join(", ", chunk.Select(item => item.value))})"));
if (!string.IsNullOrEmpty(filterText))
filterText = $"and \r\n({filterText})";
string sql =
$#"select MyField
from MyTable
where (1 = 1) {filterText}";
Test:
Console.Write(sql);
Outcome:
select MyField
from MyTable
where (1 = 1) and
(test in (1, 2, 3, 4, 5, 6, 7) or
test in (8, 9, 10, 11, 12, 13, 14) or
test in (15, 16, 17, 18, 19, 20, 21) or
test in (22, 23, 24, 25, 26, 27, 28) or
test in (29, 30, 31, 32, 33, 34, 35) or
test in (36, 37, 38, 39, 40, 41, 42) or
test in (43))

Related

How to correctly format a pd-multiindex for sktime?

I have a pd.multiindex which looks like this:
However, when I use the run check_raise(df_train, mtype="pd-multiindex)"
I get the following error:
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sktime/datatypes/_check.py:252, in check_raise(obj, mtype, scitype, var_name)
250 return True
251 else:
--> 252 raise TypeError(msg)
TypeError: input.loc[i] must be Series of mtype pd.DataFrame, not at i=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
I believe this means I am meant to convert each row into a pandas series, but I am unsure if this is correct?
Any help would be appreciated.
I have similar issue, try to check if your index have duplicate keys, in your case:
df_train.reset_index(['sbj', 'system_time_stamp'])[['sbj', 'system_time_stamp']].duplicated(keep=False)
Remove duplicated index works for me.

Is there a faster method using Numpy instead of Pandas groupby to calculate the cumulative mean?

I am trying to, as time-efficiently as possible, calculate the cumulative mean for each Player when they play a specific Position for each stat column. However, as for the specific application I am only focused on past performance, I need to exclude the current value (shift the data once). I have created a list of all the stat columns named Stats in order to save time instead of looping through them.
Right now I am using the Pandas group-by function which is sadly too slow for my data. Although the question suggests using Numpy as an alternative, I am really just after the absolute fastest method.
This is my current code, with a minimum reproducible example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Player': ['Sam', 'Bob', 'Amy', 'Sam', 'Bob', 'Amy', 'Sam', 'Bob', 'Amy','Sam', 'Bob', 'Amy', 'Sam', 'Bob', 'Amy', 'Sam', 'Bob', 'Amy'],
'Position': ['Off', 'Def', 'Def', 'Def', 'Off', 'Def', 'Def', 'Off', 'Off', 'Off', 'Def', 'Def', 'Def', 'Off', 'Def', 'Def', 'Off', 'Off'],
'Stat A': [10, 20, 30, 25, 15, 10, 20, 20, 15, 15, 25, 35, 20, 10, 15, 25, 25, 10],
'Stat B': [15, 25, 35, 20, 10, 15, 25, 25, 10, 10, 20, 30, 25, 15, 10, 20, 20, 15]})
Stats = ['Stat A', 'Stat B']
dfgroupby = df[['Player', 'Position']]
dfshift1 = df.groupby(['Player', 'Position'])[Stats].shift(1)
dfshift2 = pd.concat([dfgroupby, dfshift1], axis = 1)
dfcumsum = dfshift2.groupby(['Player', 'Position'])[Stats].cumsum()
dfcumcount1 = dfshift2.groupby(['Player', 'Position'])[Stats].cumcount()
dfcumcount2 = pd.concat([dfcumcount1] * len(Stats), axis = 1)
dfcummean1 = pd.DataFrame(dfcumsum.values / dfcumcount2.values, columns = Stats).add_suffix(' - CumMean')
dfcummean2 = pd.concat([dfgroupby, dfcummean1], axis = 1)
dfcummean2

This prime generating function using generateSequence in Kotlin is not easy to understand. :(

val primes = generateSequence(2 to generateSequence(3) {it + 2}) {
val currSeq = it.second.iterator()
val nextPrime = currSeq.next()
nextPrime to currSeq.asSequence().filter { it % nextPrime != 0}
}.map {it.first}
println(primes.take(10).toList()) // prints [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
I tried to understand this function about how it works, but not easy to me.
Could someone explain how it works? Thanks.
It generates an infinite sequence of primes using the "Sieve of Eratosthenes" (see here: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes).
This implementation uses a sequence of pairs to do this. The first element of every pair is the current prime, and the second element is a sequence of integers larger than that prime which is not divisible by any previous prime.
It starts with the pair 2 to [3, 5, 7, 9, 11, 13, 15, 17, ...], which is given by 2 to generateSequence(3) { it + 2 }.
Using this pair, we create the next pair of the sequence by taking the first element of the sequence (which is now 3), and then removing all numbers divisible by 3 from the sequence (removing 9, 15, 21 and so on). This gives us this pair: 3 to [5, 7, 11, 13, 17, ...]. Repeating this pattern will give us all primes.
After creating a sequence of pairs like this, we are finally doing .map { it.first } to pick only the actual primes, and not the inner sequences.
The sequence of pairs will evolve like this:
2 to [3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, ...]
3 to [5, 7, 11, 13, 17, 19, 23, 25, 29, ...]
5 to [7, 11, 13, 17, 19, 23, 29, ...]
7 to [11, 13, 17, 19, 23, 29, ...]
11 to [13, 17, 19, 23, 29, ...]
13 to [17, 19, 23, 29, ...]
// and so on

How much unique data is there, put it all in a table

I would like to query in SQL how many unique values ​​there are and how many rows are there. In Python, I could do it like this. But how do I do this in SQL so that I get the result like at the bottom?
In Python I could do the following
d = {'sellerid': [1, 1, 1, 2, 2, 3, 3, 3], 'modelnumber': [85, 45, 85, 12 ,85, 74, 85, 12]
, 'modelgroup': [2, 3, 2, 1, 2, 3, 2, 1 ]}
df = pd.DataFrame(data=d)
display(df.head(10))
df['Dataframe']='df'
unique_sellerid = df['sellerid'].nunique()
print("unique_sellerid", unique_sellerid)
unique_modelnumber = df['modelnumber'].nunique()
print("unique_modelnumber", unique_modelnumber)
unique_modelgroup = df['modelgroup'].nunique()
print("unique_modelgroup", unique_modelgroup)
total_rows = df.shape[0]
print("total_rows", total_rows)
[OUT]
unique_sellerid 3
unique_modelnumber 4
unique_modelgroup 3
total_rows 8
I want a query like
Here is the dummy table
CREATE TABLE cars (
sellerid INT NOT NULL,
modelnumber INT NOT NULL,
modelgroup INT,
);
INSERT INTO cars
(sellerid , modelnumber, modelgroup )
VALUES
(1, 85, 2),
(1, 45, 3),
(1, 85, 2),
(2, 12, 1),
(2, 85, 2),
(3, 74, 3),
(3, 85, 2),
(3, 12, 1);
You could use the count(distinct column) aggregation function like :
select
count(distinct col1) as nunique_col1,
count(distinct col2) as nunique_col2,
count(1) as nb_rows
from database
Also in pandas, you can also apply the nunique() function on the dataset, rather than doing it on each column: df.nunique()

Inserting new fields(columns) to mongoDB with pandas

I have an existing data in MongoDB where Primary Key is set on 'date' with a few fields in it.
And I want to insert a new pandas dataframe with new fields(columns) to the existing data in MongoDB, joining on the 'date' field which exists on the both dataframe.
For example, lets say the this is dataframe A I have in my MongoDB ( I set the index with 'date' field when calling the data from MongoDB)
And this is the new dataframe B I want to insert to MongoDB
And this is the final dataframe C with new fields( 'std_50_3000window', 'std_50_300window', 'std_50_500window' added on 'date' index), which I want it to have on my MongoDB.
Is there any way to do this?? (Maybe with insert_many method?)
The method you need is update_one() with upsert=True in a loop; you can't use insert_many() for two reasons; firstly your not always inserting; sometime you are updating; secondly update_many() (and insert_many()) only work on a single filter; in your case each filter is different as each update relates to a different time.
This is generic solution that will combine dataframes (df_a, df_b in this case - you can have as many as you like) in the manner that you need. It uses iterrows to get each row of the dataframe, filters on the date, and sets the values to those in the dataframe. the $set operator will override values if they are there already and set them if not set. upsert=True will perform an insert if there's no match on the date.
for df in [df_a, df_b]:
for _, row in df.iterrows():
db.mycollection.update_one({'date': row.get('date')}, {'$set': row.to_dict()}, upsert=True)
Full worked example:
from pymongo import MongoClient
from pprint import pprint
import datetime
import pandas as pd
# Sample data setup
db = MongoClient()['mydatabase']
data_a = [[datetime.datetime(2017, 5, 19, 21, 20), 96, 8, 98],
[datetime.datetime(2017, 5, 19, 21, 21), 95, 8, 97],
[datetime.datetime(2017, 5, 19, 21, 22), 95, 8, 97]]
df_a = pd.DataFrame(data_a, columns=['date', 'std_500_1000window', 'std_50_100window', 'std_50_2000window'])
data_b = [[datetime.datetime(2017, 5, 19, 21, 20), 98, 9, 10],
[datetime.datetime(2017, 5, 19, 21, 21), 98, 9, 10],
[datetime.datetime(2017, 5, 19, 21, 22), 98, 9, 10]]
df_b = pd.DataFrame(data_b, columns=['date', 'std_50_3000window', 'std_50_300window', 'std_50_500window'])
# Perform the upserts
for df in [df_a, df_b]:
for _, row in df.iterrows():
db.mycollection.update_one({'date': row.get('date')}, {'$set': row.to_dict()}, upsert=True)
# Print the results
for record in db.mycollection.find():
pprint(record)
Result:
{'_id': ObjectId('5f0ae909df5531ac655ce528'),
'date': datetime.datetime(2017, 5, 19, 21, 20),
'std_500_1000window': 96,
'std_50_100window': 8,
'std_50_2000window': 98,
'std_50_3000window': 98,
'std_50_300window': 9,
'std_50_500window': 10}
{'_id': ObjectId('5f0ae909df5531ac655ce52a'),
'date': datetime.datetime(2017, 5, 19, 21, 21),
'std_500_1000window': 95,
'std_50_100window': 8,
'std_50_2000window': 97,
'std_50_3000window': 98,
'std_50_300window': 9,
'std_50_500window': 10}
{'_id': ObjectId('5f0ae909df5531ac655ce52c'),
'date': datetime.datetime(2017, 5, 19, 21, 22),
'std_500_1000window': 95,
'std_50_100window': 8,
'std_50_2000window': 97,
'std_50_3000window': 98,
'std_50_300window': 9,
'std_50_500window': 10}