Find tuple element in a list with anonymous values - variables

I want to find in this list:
test = [ (1,1,1,0) , (1,1,1,1) , (1,3,1,0) , (1,4,2,0) , (1,5,2,0) , (1,6,2,0) ,
(3,1,3,5) , (3,2,3,4) , (3,3,3,3) , (3,4,4,1) , (3,5,4,2) , (3,6,4,6) ,
(2,1,1,2) , (2,2,1,5) , (2,3,1,0) , (2,4,2,4) , (2,5,2,1) , (2,6,2,0) ,
(4,1,3,0) , (4,2,3,0) , (4,3,3,0) , (4,4,4,0) , (4,5,4,0) , (4,6,4,0) ,
(5,1,5,1) , (5,2,5,6) , (5,3,5,0) , (5,4,6,2) , (5,5,6,3) , (5,6,6,0) ,
(6,1,5,3) , (6,2,5,2) , (6,3,5,4) , (6,4,6,5) , (6,5,6,6) , (6,6,6,1) ]
The tuple with anonymous elements, like (1,1,X,X), where X can be any value:
*> find (==(1,1,1,0)) test
Just (1,1,1,0)
I want to be able to do:
*> find (==(1,1,X,X)) test
(1,1,1,0)
(1,1,1,1)
The actual question is, is there any kind of anonymous variable (like "_" in prolog) to match any value?

Use filter and pattern matching.
Prelude> :t filter
filter :: (a -> Bool) -> [a] -> [a]
Takes a function that matches things:
filter (\x -> case x of (1,1,_,_) -> True; _ -> False) ...

You can use a list comprehension.
[x | x#(1,1,_,_) <- test]
This works because when you have a pattern that might fail on the left hand side of <-, values that don't match the pattern are filtered out.

Related

Fitting Multiple Input Columns in KNN Algorithm is Giving ValueError: setting an array element with a sequence

I have 2 input columns, the first column is binary (zero or one) and the second column is a feature vector of size 100. I want to fit these 2 columns in KNN model in order to predict the category column. I already did OneHotEncoding for the category column and I have outputted 15 extra columns (depending on the number of the categories).
When I fit the model it shows the following error:
ValueError: setting an array element with a sequence.
This is a part of my code:
X_level1 = np.asarray(dfCopy[['inputColumn1','inputColumn2']])
y_level1 = np.asarray(dfCopy[['OneHotEncodingColumn1','OneHotEncodingColumn2','OneHotEncodingColumn3',...,'OneHotEncodingColumn15']])
X_train1, X_val1, y_train1, y_val1 = train_test_split(X_level1, y_level1, test_size = 0.2, random_state=20)
This is a part of my input data:
array([[array([ 0.41164917, 0.33110523, -0.7823772 , 0.12783737, 1.1618725 ,
-0.7024268 , 0.84284127, 1.5140213 , 0.64215165, -1.6586455 ,
0.46136633, -0.92533016, 0.50660706, 1.0788306 , -0.9702446 ,
0.6586883 , 1.7500123 , -0.15637057, 1.4345818 , -1.9476864 ,
0.6294452 , 0.12649943, -2.3380706 , 0.61786395, -0.45559853,
-0.5325301 , 1.2698289 , -1.649353 , -0.18185338, 1.4399352 ,
1.9842219 , -0.11131181, 0.42542225, -1.3662227 , 0.57311517,
3.4422836 , -0.9965432 , -0.58612174, -0.5525687 , -2.5889783 ,
-0.8159157 , -1.8203335 , -0.58147144, 2.3315256 , 0.42271224,
-1.3675721 , -0.87182087, 0.6811211 , -1.5281016 , 1.0560112 ,
1.7546124 , 1.3516003 , 0.05760164, 0.4792729 , 0.20388177,
2.0917022 , 0.26405442, -1.012274 , -0.7311924 , -0.4222189 ,
-0.15046267, 1.838553 , -0.9228903 , -0.25226635, -2.7405736 ,
1.0562496 , 0.08701825, 0.42543337, 0.2115567 , 1.3348918 ,
-0.54058945, 1.2874343 , 0.72596663, -2.399423 , 1.7278377 ,
1.3298786 , -0.6601989 , 0.55112255, -0.60255444, 2.2411568 ,
0.31967035, 1.7551464 , -0.70625794, -1.2612839 , -0.82214457,
1.3652881 , -1.1309841 , 0.3563959 , 1.92157 , 0.9091741 ,
-0.09321591, 0.09579365, 0.87175727, 0.2785632 , 1.8571266 ,
-0.93616605, -0.09428027, 0.5034914 , 0.55093 , 1.0682331 ],
dtype=float32),
1],.,.,.,.,.,.,.,.,.,], dtype=object)
and this is part of the output data
array([[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 1, 0]], dtype=uint8)
Try to convert your input from 2 columns into 101 columns (One column for each feature). And make sure the input raws are equal to the output raws. and make sure all raws have the same number of features.
I think the model is trying (during training) to multiply the array with the weight.

How get nested json value from the following table in aws athena

Here is our code
SELECT "count"(*) "count"
, "httprequest"."country"
, "httprequest"."headers"
FROM waf_logs_17
WHERE ("httprequest"."uri" LIKE '/login')
GROUP BY "httprequest"."clientip", "httprequest"."country","httprequest"."headers"
ORDER BY "count" limit 5
Result is as follow
count
Country
headers
1
US
[{name=Host, value=app.onlinecheckwriter.com}, {name=X-Forwarded-For, value=75.113.195.00}, {name=X-Forwarded-Proto, value=https}]
Now question is how we can select the value of X-Forwarded-For under "httprequest"."headers"
Your data is not json, but array of rows, so you can process it using array functions:
-- sample data
WITH dataset (headers) AS (
values (array[cast(row('X-Forwarded-For', '75.113.195.00') as ROW(name varchar, value varchar))] )
)
-- query
select
reduce(headers, NULL, (v, curr) -> if(curr.name = 'X-Forwarded-For', curr.value, v), v -> v)
from dataset
Output:
_col0
75.113.195.00

Not condition dates also visible in output

My requirement is that I should not display the data on a specific date.
Given the select query below it should not display the dates which are present in not condition but in my output I could still see the dates which are present in not condition.
REPORT Z_SANJAY_INTERNALTABLES_06.
Tables : EKKO.
Select-options : S_dates for ekko-bedat .
Data : Begin of WA_ekko ,
WA_EBELN type ekko-ebeln ,
WA_bedat type ekko-bedat ,
WA_lifnr type ekko-lifnr ,
WA_bukrs type ekko-bukrs ,
end of WA_ekko .
Data : IT_ekko like table of WA_ekko .
select ebeln bedat lifnr bukrs from ekko into table IT_ekko where bedat in S_dates and bedat <> '02.11.2000' .
Loop at IT_ekko into WA_ekko .
write :/ WA_ekko-WA_EBELN ,
WA_ekko-WA_LIFNR ,
WA_EKKO-WA_BUKRS ,
WA_ekko-WA_bedat .
endloop.

Applying when condition only when column exists in the dataframe

I am using spark-sql-2.4.1v with java8. I have a scenario where I need to perform certain operation if columns presents in the given dataframe column list
I have Sample data frame as below, the columns of dataframe would differ based on external query executed on the database table.
val data = List(
("20", "score", "school", "2018-03-31", 14 , 12 , 20),
("21", "score", "school", "2018-03-31", 13 , 13 , 21),
("22", "rate", "school", "2018-03-31", 11 , 14, 22),
("21", "rate", "school", "2018-03-31", 13 , 12, 23)
)
val df = data.toDF("id", "code", "entity", "date", "column1", "column2" ,"column3"..."columnN")
as show above dataframe "data" columns are not fixed and would vary and would have "column1", "column2" ,"column3"..."columnN" ...
So depend on the column availability i need to perform some operations
for the same i am trying to use "when-clause" , when a column present then i have to perform certain operation on the specified column else move on to the next operation..
I am trying below two ways using "when-cluase"
First-way :
Dataset<Row> resultDs = df.withColumn("column1_avg",
when( df.schema().fieldNames().contains(col("column1")) , avg(col("column1"))))
)
Second-way :
Dataset<Row> resultDs = df.withColumn("column2_sum",
when( df.columns().contains(col("column2")) , sum(col("column1"))))
)
Error:
Cannot invoke contains(Column) on the array type String[]
so how to handle this scenario using java8 code ?
You can create a column having all the column names. then you can check if the column is present or not and process if it is available-
df.withColumn("columns_available", array(df.columns.map(lit): _*))
.withColumn("column1_org",
when( array_contains(col("columns_available"),"column1") , col("column1")))
.withColumn("x",
when( array_contains(col("columns_available"),"column4") , col("column1")))
.withColumn("column2_new",
when( array_contains(col("columns_available"),"column2") , sqrt("column2")))
.show(false)

how to query nested array JSON in postgres?

I have following structure of JSON document storing in one of my POSTGRES table
link to the sample JSON is here
here in that JSON , i have below structure inside nested array ,
"product_order_reference": {
"purchase_order_number": "0007-8653547-0590"
}
i am trying to retrieve JSON , which have the supplied purchase order number , i tried below queries , even though their are JSON rows for that purchase order numbers , query returning nothing
queries i tried :
SELECT * from edi_records , jsonb_array_elements(valid_record :: jsonb ->'loop_id_hls') hls,jsonb_array_elements(hls->'loop_id_hlo') hlo where hlo->'product_order_reference' ->> 'purchase_order_number' = '0007-8653547-0590';
SELECT * from edi_records , jsonb_array_elements(valid_record :: jsonb ->'loop_id_hls') hls,jsonb_array_elements(hls->'loop_id_hlo') hlo where hlo ->> 'purchase_order_number' = '0007-8653547-0590';
SELECT * from edi_records , jsonb_array_elements(valid_record :: jsonb ->'advance_shipment_notice'::text->'loop_id_hls') hls,jsonb_array_elements(hls->'loop_id_hlo') hlo where hlo ->> 'purchase_order_number' = '0007-8653547-0590';
SELECT track_num from edi_records , jsonb_array_elements(valid_record :: jsonb ->'advance_shipment_notice'->'loop_id_hls') hls,jsonb_array_elements(hls->'loop_id_hlo') hlo where hlo -> 'product_order_reference'->> 'purchase_order_number' ::text = '0007-8653547-0590';
can any one please help me how to solve this , i am stuck here with this .
I copy and pasted your JSON object. It's a bit large but I was able to get the order number. The main hassle is all the nested arrays.
Downside is that I am digging into the json object manually. If the structure change or if the keys contain duplicate objects that requires a bit of searching, then the results would be wrong. I am sure this can be improved.
SELECT
your_json -> 'advance_shipment_notice'
-> 'loop_id_hls'
-> 0 -- {loop_id_hls}
-> 'loop_id_hlo'
-> 0 -- {loop_id_hlo}
-> 'product_order_reference'
-> 'purchase_order_number' AS purchase_order_number
FROM your_json;