Fitting Multiple Input Columns in KNN Algorithm is Giving ValueError: setting an array element with a sequence - sequence

I have 2 input columns, the first column is binary (zero or one) and the second column is a feature vector of size 100. I want to fit these 2 columns in KNN model in order to predict the category column. I already did OneHotEncoding for the category column and I have outputted 15 extra columns (depending on the number of the categories).
When I fit the model it shows the following error:
ValueError: setting an array element with a sequence.
This is a part of my code:
X_level1 = np.asarray(dfCopy[['inputColumn1','inputColumn2']])
y_level1 = np.asarray(dfCopy[['OneHotEncodingColumn1','OneHotEncodingColumn2','OneHotEncodingColumn3',...,'OneHotEncodingColumn15']])
X_train1, X_val1, y_train1, y_val1 = train_test_split(X_level1, y_level1, test_size = 0.2, random_state=20)
This is a part of my input data:
array([[array([ 0.41164917, 0.33110523, -0.7823772 , 0.12783737, 1.1618725 ,
-0.7024268 , 0.84284127, 1.5140213 , 0.64215165, -1.6586455 ,
0.46136633, -0.92533016, 0.50660706, 1.0788306 , -0.9702446 ,
0.6586883 , 1.7500123 , -0.15637057, 1.4345818 , -1.9476864 ,
0.6294452 , 0.12649943, -2.3380706 , 0.61786395, -0.45559853,
-0.5325301 , 1.2698289 , -1.649353 , -0.18185338, 1.4399352 ,
1.9842219 , -0.11131181, 0.42542225, -1.3662227 , 0.57311517,
3.4422836 , -0.9965432 , -0.58612174, -0.5525687 , -2.5889783 ,
-0.8159157 , -1.8203335 , -0.58147144, 2.3315256 , 0.42271224,
-1.3675721 , -0.87182087, 0.6811211 , -1.5281016 , 1.0560112 ,
1.7546124 , 1.3516003 , 0.05760164, 0.4792729 , 0.20388177,
2.0917022 , 0.26405442, -1.012274 , -0.7311924 , -0.4222189 ,
-0.15046267, 1.838553 , -0.9228903 , -0.25226635, -2.7405736 ,
1.0562496 , 0.08701825, 0.42543337, 0.2115567 , 1.3348918 ,
-0.54058945, 1.2874343 , 0.72596663, -2.399423 , 1.7278377 ,
1.3298786 , -0.6601989 , 0.55112255, -0.60255444, 2.2411568 ,
0.31967035, 1.7551464 , -0.70625794, -1.2612839 , -0.82214457,
1.3652881 , -1.1309841 , 0.3563959 , 1.92157 , 0.9091741 ,
-0.09321591, 0.09579365, 0.87175727, 0.2785632 , 1.8571266 ,
-0.93616605, -0.09428027, 0.5034914 , 0.55093 , 1.0682331 ],
dtype=float32),
1],.,.,.,.,.,.,.,.,.,], dtype=object)
and this is part of the output data
array([[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 1, 0]], dtype=uint8)

Try to convert your input from 2 columns into 101 columns (One column for each feature). And make sure the input raws are equal to the output raws. and make sure all raws have the same number of features.
I think the model is trying (during training) to multiply the array with the weight.

Related

Applying when condition only when column exists in the dataframe

I am using spark-sql-2.4.1v with java8. I have a scenario where I need to perform certain operation if columns presents in the given dataframe column list
I have Sample data frame as below, the columns of dataframe would differ based on external query executed on the database table.
val data = List(
("20", "score", "school", "2018-03-31", 14 , 12 , 20),
("21", "score", "school", "2018-03-31", 13 , 13 , 21),
("22", "rate", "school", "2018-03-31", 11 , 14, 22),
("21", "rate", "school", "2018-03-31", 13 , 12, 23)
)
val df = data.toDF("id", "code", "entity", "date", "column1", "column2" ,"column3"..."columnN")
as show above dataframe "data" columns are not fixed and would vary and would have "column1", "column2" ,"column3"..."columnN" ...
So depend on the column availability i need to perform some operations
for the same i am trying to use "when-clause" , when a column present then i have to perform certain operation on the specified column else move on to the next operation..
I am trying below two ways using "when-cluase"
First-way :
Dataset<Row> resultDs = df.withColumn("column1_avg",
when( df.schema().fieldNames().contains(col("column1")) , avg(col("column1"))))
)
Second-way :
Dataset<Row> resultDs = df.withColumn("column2_sum",
when( df.columns().contains(col("column2")) , sum(col("column1"))))
)
Error:
Cannot invoke contains(Column) on the array type String[]
so how to handle this scenario using java8 code ?
You can create a column having all the column names. then you can check if the column is present or not and process if it is available-
df.withColumn("columns_available", array(df.columns.map(lit): _*))
.withColumn("column1_org",
when( array_contains(col("columns_available"),"column1") , col("column1")))
.withColumn("x",
when( array_contains(col("columns_available"),"column4") , col("column1")))
.withColumn("column2_new",
when( array_contains(col("columns_available"),"column2") , sqrt("column2")))
.show(false)

using multiple parameters in append query in Access 2010

I have been trying to get an append query to work but I keep getting an error stating that 0 rows are being appended whenever I use more than 1 parameter in the query. This is for a
The table in question has 1 PK which is a GUID [which is generating values with newid()] and one required field (Historical) which I am explictly defining in the query.
INSERT INTO dbo_sales_quotas ( salesrep_id
, [year]
, territory_id
, sales_quota
, profit_quota
, product_super_group_uid
, product_super_group_desc
, class_9
, Historical
, sales_quotas_UID )
SELECT dbo_sales_quotas.salesrep_id
, dbo_sales_quotas.Year
, dbo_sales_quotas.territory_id
, dbo_sales_quotas.sales_quota
, dbo_sales_quotas.profit_quota
, dbo_sales_quotas.product_super_group_uid
, dbo_sales_quotas.product_super_group_desc
, dbo_sales_quotas.class_9
, dbo_sales_quotas.Historical
, dbo_sales_quotas.sales_quotas_UID
FROM dbo_sales_quotas
WHERE (((dbo_sales_quotas.salesrep_id)=[cboSalesRepID])
AND ((dbo_sales_quotas.Year)=[txtYear])
AND ((dbo_sales_quotas.territory_id)=[txtTerritoryID])
AND ((dbo_sales_quotas.sales_quota)=[txtSalesQuota])
AND ((dbo_sales_quotas.profit_quota)=[txtProfitQuota])
AND ((dbo_sales_quotas.product_super_group_uid)=[cboProdSuperGroup])
AND ((dbo_sales_quotas.product_super_group_desc)=[txtProductSuperGroupDesc])
AND ((dbo_sales_quotas.class_9)=[cboClass9])
AND ((dbo_sales_quotas.Historical)='No')
AND ((dbo_sales_quotas.sales_quotas_UID)='newid()'));
Even if I assign specific values, I still get a 0 rows error except when I reduce the number of parameters to 1 (which it then works perfectly regardless of which parameter) I have verified that the parameters have the correct formats.
Can anyone tell me what I'm doing wrong?
Break out the SELECT part of your query and examine it separately. I'll suggest a simplified version which may be easier to study ...
SELECT
dsq.salesrep_id,
dsq.Year,
dsq.territory_id,
dsq.sales_quota,
dsq.profit_quota,
dsq.product_super_group_uid,
dsq.product_super_group_desc,
dsq.class_9,
dsq.Historical,
dsq.sales_quotas_UID
FROM dbo_sales_quotas AS dsq
WHERE
dsq.salesrep_id=[cboSalesRepID]
AND dsq.Year=[txtYear]
AND dsq.territory_id=[txtTerritoryID]
AND dsq.sales_quota=[txtSalesQuota]
AND dsq.profit_quota=[txtProfitQuota]
AND dsq.product_super_group_uid=[cboProdSuperGroup]
AND dsq.product_super_group_desc=[txtProductSuperGroupDesc]
AND dsq.class_9=[cboClass9]
AND dsq.Historical='No'
AND dsq.sales_quotas_UID='newid()';
I wonder about the last 2 conditions in the WHERE clause. Is the Historical field type bit instead of text? Does the string 'newid()' match sales_quotas_UID in any rows in the table?

Find tuple element in a list with anonymous values

I want to find in this list:
test = [ (1,1,1,0) , (1,1,1,1) , (1,3,1,0) , (1,4,2,0) , (1,5,2,0) , (1,6,2,0) ,
(3,1,3,5) , (3,2,3,4) , (3,3,3,3) , (3,4,4,1) , (3,5,4,2) , (3,6,4,6) ,
(2,1,1,2) , (2,2,1,5) , (2,3,1,0) , (2,4,2,4) , (2,5,2,1) , (2,6,2,0) ,
(4,1,3,0) , (4,2,3,0) , (4,3,3,0) , (4,4,4,0) , (4,5,4,0) , (4,6,4,0) ,
(5,1,5,1) , (5,2,5,6) , (5,3,5,0) , (5,4,6,2) , (5,5,6,3) , (5,6,6,0) ,
(6,1,5,3) , (6,2,5,2) , (6,3,5,4) , (6,4,6,5) , (6,5,6,6) , (6,6,6,1) ]
The tuple with anonymous elements, like (1,1,X,X), where X can be any value:
*> find (==(1,1,1,0)) test
Just (1,1,1,0)
I want to be able to do:
*> find (==(1,1,X,X)) test
(1,1,1,0)
(1,1,1,1)
The actual question is, is there any kind of anonymous variable (like "_" in prolog) to match any value?
Use filter and pattern matching.
Prelude> :t filter
filter :: (a -> Bool) -> [a] -> [a]
Takes a function that matches things:
filter (\x -> case x of (1,1,_,_) -> True; _ -> False) ...
You can use a list comprehension.
[x | x#(1,1,_,_) <- test]
This works because when you have a pattern that might fail on the left hand side of <-, values that don't match the pattern are filtered out.

Parsing and getting specific values from CSV string

I am having a string in CSV format. Please see my earlier question
Parsing CSV string and binding it to listbox
I can add new values to it by some mechanism. Everything will be same except the new values will have numeric part = 0. Take example
I have this exsiting CSV string
1 , abc.txt , 2 , def.doc , 3 , flyaway.txt
Now by some mechanism i added two more files Superman.txt and Spiderman.txt to the existing string. Now it became
1 , abc.txt , 2 , def.doc , 3 , flyaway.txt, 0, Superman.txt, 0 , Spiderman.txt
What i am doing here is that this csv string is paased into SP where its splitted and inserted to db. So for inserting I have to take the files with numeric part 0 only rest will be omiited .Which will be further then converted into CSV string
Array will look like this
str[0]="1"
str[1]="abc.txt"
str[2]="2"
str[3]="def.doc "
str[4]="3"
str[5]="flyaway.txt"
str[6]="0"
str[7]="Superman.txt"
str[8]="0"
str[9]="Spiderman.txt"
So at last i want to say my input will be
1 , abc.txt , 2 , def.doc , 3 , flyaway.txt, 0, Superman.txt, 0 , Spiderman.txt
Desired Output:
0, Superman.txt, 0 , Spiderman.txt
If your new values are always being added to the end of the input, all you need to do is search the string for the first 0 and then return it and everything after it using string.Substring(Int32).

Activerecord, select on string compare

I have a table called Basic , start_time is one field with type :VARCHAR(5), which actually stores a 5 bytes time data: byte 0 and 1 map to the year , month and day, and byte 2 to 4 map to the hour, min and second. So, it could possible bytes 2 ,3 ,4 are all 0. And I want to do following query :
Basic.find (:all , :conditions => "start_time > ? AND end_time < ?" , start_time , end_time)
Here are the questions:
Suppose in VARCHAR(5) format ,the start time is [214, 222,0 ,0, 0] (Jun, 24th, 2009) and the end time is [214, 223, 0, 0, 0] (Jun , 25, 2009).
As activerecord maps VARCHAR(5) to String , so in above query the start_time and end_time should also be String. What is the correct way to convert above VARCHAR(5) format time to the String?
I did it this way, but fails to get correct result:
tmp = [214, 222,0 ,0 ,0].map {|t| t.to_s(16)} ; start_time = tmp.to_s
And i was using sqlite3 adapter for activerecord.
Thanks for your help.
I have found where the problem is: "\000" is not allowed to be contained in the start_time when do follwing query:
Basic.find (:all , :conditions => "start_time > ? AND end_time < ?" , start_time , end_time)
So, I need to do two steps:
[214, 222,0 ,0, 0] - > [214,222]
[214,222] -> "\326\336"
The 1st steps can be done using:
a = [214,222,0,0,0]
while a.last ==0 do a.pop end
The 2nd steps can be done using:
a = [214,222]
a.pack("c" * a.size)
However, I still can not do query when start_time = [214, 222,0 ,23, 0] , because the corresponding string contains "\000" in the middle. fortunately, It would not be a big problem in our appication , as we never query in hours level, that means last two number will always be 0.
Is this what you're after?
[214, 222, 0, 0, 0].inject(""){|s,c| s << c.chr; s} # => "\326\336\000\000\000"
As to why I know this...I wanted to say MLM on twitter once, without actually saying it. Because if you actually say it, you get auto-followed by all these MLM spammers. So instead I said
[77, 76, 77].inject(""){|s,c| s << c.chr; s}
In your case, though, it seems like a lot of work just to avoid using a timestamp.