Kmodes function Error in x[[jj]][iseq] <- vjj : replacement has length zero - sql

I got this error when using a big data set, I cleaned the data and used Data<-na.omit(Data) to delete all rows with nulls, it worked and in RStudio I don't get any errors.
When I run the script in SQL as an external script I get the same error as before
Error in x[[jj]][iseq] <- vjj : replacement has length zero
even though I'm using the same Rscript and dataset is the same.
Has anyone had the same issue and how did you solve it.
thanks

Related

DBT: How to fix Database Error Expecting Value?

I was running into troubles today while running Airflow and airflow-dbt-python. I tried to debug a bit using the logs and the error shown in the logs was this one:
[2022-12-27, 13:53:53 CET] {functions.py:226} ERROR - [0m12:53:53.642186 [error] [MainThread]: Encountered an error:
Database Error
Expecting value: line 2 column 5 (char 5)
Quite a weird one.
Possibly check your credentials file that allows DBT to run queries on your DB (in our case we run DBT with BigQuery), in our case the credentials file was empty. We even tried to run DBT directly in the worker instead of running it through airflow, giving as a result exactly the same error. Unfortunately this error is not really explicit.

When trying to get the source of a function in Postgres using psql, what does the error "column p.proisagg does not exist" mean?

Background:
using postgres 11 on RDS, interface is psql on a Centos 7 box; objective is to show the source of certain stored procs / functions so that I can work with them
Problem description : When I attempt to list / show the source of a given stored function using the \df+ command which I understand to be correct for this use based on [official docs here](https://www.postgresql.org/docs/current/app-psql.html], an error is given as shown:
psql=> \df+ schema_foo.proc_bar;
ERROR: column p.proisagg does not exist
LINE 6: WHEN p.proisagg THEN 'agg'
I have no clue how to interpret this; the function in question does not contain the snippet of logic shown in the error, nor the column referenced there p.proisagg. I have verified this by opening the function in vim with \ef.
My guess based on several unrelated github issues that mention this same error for example is that it is in reference to some schema code internal to postgres.
Summary: in short, I can view the source of the functions using \ef, so my work is not impaired from a practical standpoint, however I wish to understand this error and why I'm encountering it with \df+.
I had the same issue and ran these 2 commands to fix it
sed -i "s/NOT pp.proisagg/pp.prokind='f'/g" /usr/share/phpPgAdmin/classes/database/Postgres.php
sed -i "s/NOT p.proisagg/p.prokind='f'/g" /usr/share/phpPgAdmin/classes/database/Postgres.php

Unable to extract data with double pipe delimiter in Pig Script

I am trying to extract data which is pipe delimited in Pig. Following is my command
L = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('||');
Iam getting following error
2016-08-04 23:58:21,122 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[||]'
My input sample file has exactly 5 lines as following
POS_TIBCO||HDFS||POS_LOG||1||7806||2016-07-18||1||993||0
POS_TIBCO||HDFS||POS_LOG||2||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||3||7806||2016-07-18||1||0||5
POS_TIBCO||HDFS||POS_LOG||4||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||5||7806||2016-07-18||1||0||19.99
I tried several options like using the backslash before delimiter(\||,\|\|) but everything failed. Also, I tried with schema but got the same error.I am using Horton works(HDP2.2.4) and pig (0.14.0).
Any help is appreciated. Please let me know if you need any further details.
I have faced this case, and by checking PigStorage code source, i think PigStorage argument should be parsed into only one character.
So we can use this code instead:
L0 = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('|');
L = FOREACH L0 GENERATE $0,$2,$4,$6,$8,$10,$12,$14,$16;
Its helpful if you know how many column you have, and it will not affect performance because it's map side.
When you load data using PigStorage, It only expects single character as delimiter.
However if still you want to achieve this you can use MyRegExLoader-
REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('||')
as (movieid:int, title:chararray, genre:chararray);

Error in BQ shell Loading Datastore with write_disposition as Write append

1: I tried to load on an existing table [using Datastore file]
2. Bq Shell asked me to add write_disposition to write append to load to existing table
3. If I do the above, throws an error as follows:
load --source_format=DATASTORE_BACKUP --write_disposition=WRITE_append --allow_jagged_rows=None sample_red.t1estchallenge_1 gs://test.appspot.com/bucket/ahFzfnZpcmdpbi1yZWQtdGVzdHJBCxIcX0FFX0RhdGFzdG9yZUFkbWluX09wZXJhdGlvbhiBwLgCDAsSFl9BRV9CYWNrdXBfSW5mb3JtYXRpb24YAQw.entity.backup_info
Error parsing command: flag --allow_jagged_rows=None: ('Non-boolean argument to boolean flag',None)
I tried allow jagged rows = 0 and allow jagged rows = None, nothing works just the same error.
Please advise on this.
UPDATE: As Mosha suggested --allow_jagged_rows=false has worked. It should be before --write_disposition=Write_truncate. But this has led to another issue on encoding. Can anyone say what should be the encoding type for DATASTORE_BACKUP?. I tried both --encoding=UTF-8 and --encoding=ISO-8859.
load --source_format=DATASTORE_BACKUP --allow_jagged_rows=false --write_disposition=WRITE_TRUNCATE sample_red.t1estchallenge_1 gs://test.appspot.com/STAGING/ahFzfnZpcmdpbi1yZWQtdGVzdHJBCxIcX0FFX0RhdGFzdG9yZUFkbWluX09wZXJhdGlvbhiBwLgCDAsSFl9BRV9CYWNrdXBfSW5mb3JtYXRpb24YAQw.entityname.backup_info
Please advise.
You should use "false" (or "true") with boolean arguments, i.e.
--allow_jagged_rows=false

Print only nonzero results using AMPL + Neos server

I'm doing a optimization model of a relatively big model. I will use 15 timesteps in this model, but now when I'm testing it I am only using 4. However, even with 11 time steps less than desired the model still prints 22 000 rows of variables, where perhaps merely a hundred differs from 0.
Does anyone see a way past this? I.e. a way using NEOS server to only print the variable name and corresponding value if it is higher than 0.
What I've tested is:
solve;
option omit_zero_rows 0; (also tried 1;)
display _varname, _var;
Using both omit_zero_rows 0; or omit_zero_rows 1; still prints every result, and not those higher than 0.
I've also tried:
solve;
if _var > 0 then {
display _varname, _var;
}
but it gave me syntax error. Both (or really, the three) variants were tested in the .run file I use for NEOS server.
I'm posting a solution to this issue, as I believe that this is an issue more people will stumble upon. Basically, in order to print only non-zero values using NEOS Server write your command file (.run file) as:
solve;
display {j in 1.._nvars: _var[j] > 0} (_varname[j], _var[j]);