pymongo cursor hint with nonexistent index name: "planner returned error: bad hint" - mongodb-query

I'm using pymongo 3.6.0 and issuing a query with a hint and an index name. I thought I didn't need to worry about ensuring the collection had the specified index -- according to the docs, the hint should have no effect if the index doesn't exist: https://api.mongodb.com/python/3.6.0/api/pymongo/cursor.html#pymongo.cursor.Cursor.hint
But the cursor throws an error when trying to retrieve the data after the query is issued.
Example:
>> cursor = collection.find({'name': 'foo'}).hint('nonexistent_index_name')
>> cursor
<pymongo.cursor.Cursor>
The query returns a Cursor, but calling anything with the cursor:
>> cursor.count()
or
>> list(cursor)
Results in the error:
File "/python-2.7/lib/python2.7/site-packages/pymongo/cursor.py", line 1176, in next
if len(self.__data) or self._refresh():
File "/python-2.7/lib/python2.7/site-packages/pymongo/cursor.py", line 1087, in _refresh
self.__send_message(q)
File "/python-2.7/lib/python2.7/site-packages/pymongo/cursor.py", line 974, in __send_message
helpers._check_command_response(first)
File "/python-2.7/lib/python2.7/site-packages/pymongo/helpers.py", line 146, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: error processing query: ns=collection_nameTree: name == "foo"
Sort: {}
Proj: {}
planner returned error: bad hint
This query returns the expected result when I use an existing index name, or use no hint:
>> cursor = collection.find({'name': 'foo'}).hint('existing_index_name')
>> list(cursor)
[{'name': 'foo'}]
>> cursor = collection.find({'name': 'foo'})
>> list(cursor)
[{'name': 'foo'}]
Am I doing something wrong?

Resolved -- the Pymongo doc is wrong and a non-existent hint does return an error.
Ticket to fix documentation: https://jira.mongodb.org/browse/PYTHON-1615

Related

How to pass string arguments with spaces to SQL notebook in databricks?

I have a SQL notebook(notebookA) where I want to pass arguments from another notebook(notebookB).
---notebookA---
SELECT $v as $c
When I do this from notebook B, it is giving me result.
---notebookB---
%run ./notebookA $v='james' $c=name
But when there is a space in value it is giving me error like below
---notebookB---
%run ./notebookA $v='james potter' $c=name
Failed to parse %run command: string matching regex `\$[\w_]+' expected but `p' found)
What would be the solution then?
Magic commands do not allow variables to be passed. Instead you can use dbutils.
Python:
dbutils.notebook.run("notebookA", 60, {"v": "james potter", "c": name})
Reference: https://docs.databricks.com/user-guide/notebooks/notebook-workflows.html

Hive coalesce parse exception

I want to create a hive script that uses as database one of two given parameters, whichever is not null.
My hive-test.sql is this:
set db_name = coalesce(${hiveconf:dbOne}, ${hiveconf:dbTwo});
use ${hiveconf:db_name};
show tables;
and I run it with:
hive -hiveconf dbOne=my_database -f hive-test.sql
and I am getting:
FAILED: ParseException line 2:12 missing EOF at '(' near 'coalesce'
I should note that if I change the first line in script to:
set db_name = my_database;
it works.
I can't figure out what I did wrong. Your assistance is appreciated.
This feature is not available in Hive.
Do variable assignment in the shell, for example like here: setting-a-shell-variable-in-a-null-coalescing-fashion and pass it to the Hive.

Pig process multiple file error: ERROR 0: Error while executing ForEach at []

I have 4 files A, B, C, D under the directory /user/bizlog/cpc on HDFS, and the record looks like this:
87465422^C376832^C27786^C21161214^Ckey
Here is my pig script:
cpc_all = load '/user/bizlog/cpc' using PigStorage('\u0003') as (cpcid, accountid, cpcplanid, cpcgrpid, key);
cpc = foreach cpc_all generate accountid, key;
account_group = group cpc by accountid;
account_sort = order account_group by group;
account_key = foreach account_sort generate group, BagToTuple(cpc.key);
store account_key into 'last' using PigStorage('\u0003');
It will get results such as:
376832^Ckey1^Ckey2
Above script suppose to process all the 4 files, but I get this error:
Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.
Pig Stack Trace
---------------
ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
================================================================================
Oddly if I load one single file such as load '/user/bizlog/cpc/A' then the script will succeed.
If I load each file first and then union them, it will work fine too.
If I put the sort step at the last and the error goes away
The version of hadoop is 0.20.2 and the pig version is 0.12.1, any help will be appreciated
As mentioned in the comments:
I put the sort step at the last and the error goes away
Though I did not find much on the topic, it appears that pig does not like to rearrange the group itself.
As such the 'solution' is to rearrane the output of what is generated for the group, instead of ordering the group itself.

How to modify a line in a file with Erlang OTP module

I got a big file and I would like to replace the first line with other content.
When I use {ok, IoDev} = file:open("/root/FileName", [write, raw, binary]), the whole content is removed.
But when I use {ok, IoDev} = file:open("/root/FileName", [append, raw, binary]) and file:pwrite(S, {bof,0}, <<"new content\n">>), I got the result {error, badarg}.
If I set Location to 0: file:pwrite(S, 0, <<"new content\n">>), the string is appended at tail of the file.
You seem to be confused with the actual file API.
file:open/2 will truncate the file if you pass [write, raw, binary]as you do:
(about write mode): The file is opened for writing. It is created if it does not exist. If the file exists, and if write is not combined with read, the file will be truncated.
So you need to pass either [write, read] or [write, append] as documented.
file:pwrite/3 also works exactly as documented. It allows you to write at a given position in the file. In particular, you cannot pass {bof, 0} as second argument since you opened the file in raw mode:
If IoDevice has been opened in raw mode, some restrictions apply: Location is only allowed to be an integer; and the current position of the file is undefined after the operation.
The following sample code shows how they work:
ok = file:write_file("/tmp/file", "This is line 1.\nThis is line 2.\n"),
{ok, F} = file:open("/tmp/file", [read, write, raw, binary]),
ok = file:pwrite(F, 0, <<"This is line A.\n">>),
ok = file:close(F),
{ok, Content} = file:read_file("/tmp/file"),
io:put_chars(Content),
ok = file:delete("/tmp/file").
It will output:
This is line A.
This is line 2.
This works because text "This is line A.\n" is exactly as long as "This is line 1.\n". It does not really replace the line, but just bytes. If you need to replace the first line with content that has a different length, you need to rewrite the whole content of the file. A common approach is indeed to write a new file and swap them eventually. If the file is small enough, however, you can read it entirely in memory and rewrite it. file:read_file/1 and file:write_file/2 would work:
replace_first_line(Path, NewLine) ->
{ok, Content} = file:read_file(Path),
[FirstLine | Tail] = binary:split(Content, <<"\n">>),
NewContent = [NewLine, <<"\n">> | Tail],
ok = file:write_file(Path, NewContent).
The question is not related to erlang but rather general file operations.
Replacing a line in a file requires to rewrite the file in a whole. The easiest way to do so would be to write all the new content in a new file and then to move the file.

Django Oracle integrity error when saving any instance to database

I'm doing a migration from sqlite to oracle backend. The oracle database already exists and is maintained by other people. Its version is Oracle9i Enterprise Edition Release 9.2.0.1.0.
I have a simple model:
class AliasType(models.Model):
id = models.AutoField(primary_key=True, db_column="F_ALIAS_ID")
name = models.CharField(u"Type name", max_length=255, unique=True, db_column="F_ALIAS_NAME")
class Meta:
db_table = "ALIAS"
./manage.py syncdb does not return any errors. But when I try to create a new instance and save it to the database, I get the following error:
>>> AliasType.objects.create(name="test")
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/models/manager.py", line 138, in create
return self.get_query_set().create(**kwargs)
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/models/query.py", line 360, in create
obj.save(force_insert=True, using=self.db)
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/models/base.py", line 460, in save
self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/models/base.py", line 553, in save_base
result = manager._insert(values, return_id=update_pk, using=using)
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/models/manager.py", line 195, in _insert
return insert_query(self.model, values, **kwargs)
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/models/query.py", line 1435, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/models/sql/compiler.py", line 791, in execute_sql
cursor = super(SQLInsertCompiler, self).execute_sql(None)
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/models/sql/compiler.py", line 735, in execute_sql
cursor.execute(sql, params)
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/backends/util.py", line 18, in execute
return self.cursor.execute(sql, params)
File "/mnt/Data/private/projects/envs/termary-oracle/src/django/django/db/backends/oracle/base.py", line 630, in execute
return self.cursor.execute(query, self._param_generator(params))
IntegrityError: ORA-01400: cannot insert NULL into ("SINCE"."ALIAS"."F_ALIAS_ID")
If I specify id, e.g. AliasType.objects.create(id=5, name="test"), it works. I thought django should be able to retrieve id value automatically. I've learnt that Oracle does not support autoincrement, and I should use triggers and sequences. I was told that there is an existing sequence in the database that returns ids for all new rows, and I know its name, say SEQ_GET_NEW_ID.
So the question is how to implement that in the most elegant way, i.e. how to tell Django to get id values for all new objects from the sequence named SEQ_GET_NEW_ID without hacking it too much (e.g. overriding save() methods for all models)?
There is a ticket open (#1946) to allow exactly that, overriding the default sequence name. But as it's not closed yet, I don't think there is a way without hacking.
I haven't used Oracle before, but a quick search suggests that it is possible to create aliases/synonyms for sequences. manage.py sqlall <app> should show you the sequence name Django is expecting. So you probably could just make this an alias for SEQ_GET_NEW_ID.