Aerospike AQL Query Against List of Values - aerospike

In AQL I can do
select * from ns.set where PK='some val'
How can I query against a list of values? Something like
select * from ns.set where PK in ('val1', 'val2'...)
When trying to run above code I get 'Unsupported command format'

Not supported in AQL.
Aerospike clients (what you will use in production) support this functionality. Its called Batch Index Read. http://www.aerospike.com/docs/guide/batch.html

Related

What is the equivalent of Select into in Google bigquery

I am trying to write a SQL Command to insert some data from one table to a new table without any insert statement in bigquery but I cannot find a way to do it. (something similar to select into)
Here is the table:
create table database1.table1
(
pdesc string,
num int64
);
And here is the insert statement. I also tried the select into but it is not supported in bigquery.
insert into database1.table1
select column1, count(column2) as num
from database1.table2
group by column1;
Above is a possible way to insert. but I am looking for a way that I do not need to use any select statement. I am looking for something similar to 'select into' statement.
I am thinking of declaring variables and then somehow feed the data into the tables but do not know how.
I am not a Google employee. However - I understand the reasoning for not supporting creating a copy of a table (or query) from the console.
The challenge is that each table needs to be created must have a number of features defined such as associated project and expiry time.
Looking through the documentation (briefly) - it is worth exploring using bq utility - specifically the cp command -
Explore the following operations :
cache the query results to a temporary table
get the name of said temporary table
pass to a copy table command perhaps?
Other methods are described in the google cloud doco https://cloud.google.com/bigquery/docs/managing-tables#copy-table

How to check if a table exists in Hive?

I am connecting to Hive via an ODBC driver from a .NET application. Is there a query to determine if a table already exists?
For example, in MSSQL you can query the INFORMATION_SCHEMA table and in Netezza you can query the _v_table table.
Any assistance would be appreciated.
Execute the following command : show tables in DB like 'TABLENAME'
If the table exists, its name will be returned, otherwise nothing will be returned.
This is done directly from hive. for more options see this.
DB is the database in which you want to see if the table exists.
TABLENAME is the table name you seek,
What actually happens is that Hive queries its metastore (depends on your configuration but it can be in a standard RDBMS like MySQL) so you can optionally connect directly to the same metastore and write your own query to see if the table exists.
There are two approaches by which you can check that:
1.) As #dimamah suggested, just to add one point here, for this approach you need to
1.1) start the **hiveserver** before running the query
1.2) you have to run two queries
1.2.1) USE <database_name>
1.2.2) SHOW TABLES LIKE 'table_name'
1.2.3) Then you check your result using Result set.
2.) Second approach is to use HiveMetastoreClient APIs, where you can directly use the APIs to check whether the table_name exist in a particular database or not.
For further help please go through this Hive 11
When programming on Hive by Spark SQL, you can use following method to check whether Hive table exists.
if (hiveContext.hql("SHOW TABLES LIKE '" + tableName + "'").count() == 1) {
println(tableName + " exists")
}
If someone is using shell script like me then my answer could be useful. Assume that your table is in the default namespace.
table=your_hive_table
validateTable=$(hive --database default -e "SHOW TABLES LIKE '$table'")
if [[ -z $validateTable ]]; then
echo "Error:: $table cannot be found"
exit 1
fi
If you're using SparkSQL you can do the following.
if "table_name" in sqlContext.tableNames("db_name"):
...do something
http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.SQLContext.tableNames
Code similar to below one can find in many of my Spark notebooks:
stg_table_exists = sqlCtx.sql("SHOW TABLES IN "+ stg_db)
.filter("tableName='%s'" % stg_tab_name) .collect()
(made two-liner for readability)
I wish Spark would have an API call to check the same.
If you're using a scala spark app and SparkSQL you can do the following
if spark.catalog.tableExists("tablename") {do something}

Bind variables in the from clause for Postgres

I'm attempting to write an extension for SQL Developer to better support Postgres. These are just XML configuration files with SQL snippets in them. To display the values for a postgres sequence, I need to run a simple query of the following form:
select * from schema.sequence
The trouble with this is that the Oracle SQL Developer environment provides the correct schema and node (sequence) name as bind variables. This would mean that I should format the query as:
select * from :SCHEMA.:NAME
The trouble with this is that bind variables are only valid in the select clause or the where clause (as far as I'm aware), and using this form of the query returns a "syntax error at or near "$1" error message.
Is there a way to return the values in the sequence object without directly selecting them from the sequence? Perhaps some obtuse joined statement from pg_catalog tables?
Try this:
select *
from information_schema.sequences
where sequence_name = :name
and sequence_schema = :schema;
It's not exactly the same thing as doing a select from the sequence, but the basic information is there.

HIve CLI doesn't support MySql style data import to tables

Why can't we import data into hive CLI as following, The hive_test table has user, comments columns.
insert into table hive_test (user, comments)
value ("hello", "this is a test query");
Hive throws following exception in hive CLI
FAILED: ParseException line 1:28 cannot recognize input near '(' 'user' ',' in select clause
I don't want to import the data through csv file like following for a testing perpose.
load data local inpath '/home/hduser/test_data.csv' into table hive_test;
It's worth noting that Hive advertises "SQL-like" syntax, rather than actual SQL syntax. There's no particular reason to think that pure SQL queries will actually run on Hive. HiveQL's DML is documented here on the Wiki, and does not support the column specification syntax or the VALUES clause. However, it does support this syntax:
INSERT INTO TABLE tablename1 SELECT ... FROM ...
Extrapolating from these test queries, you might be able to get something like the following to work:
INSERT INTO TABLE hive_test SELECT 'hello', 'this is a test query' FROM src LIMIT 1
However, it does seem that Hive is not really optimized for this small-scale data manipulation. I don't have a Hive instance to test any of this on.
I think, it is because user is a built-in (Reserved) keyword.
Try this:
insert into table hive_test ("user", comments)
value ('hello', 'this is a test query');

Django: Using custom raw SQL inserts with executemany and MySQL

I need to upload a lot of data to a MySQL db. For most models I use django's ORM, but one of my models will have billions (!) of instances and I would like to optimize its insert operation.
I can't seem to find a way to make executemany() work, and after googling it seems there are almost no examples out there.
I'm looking for the correct sql syntax + correct command syntax + correct values data structure to support an executemany command for the following sql statement:
INSERT INTO `some_table` (`int_column1`, `float_column2`, `string_column3`, `datetime_column4`) VALUES (%d, %f, %s, %s)
Yes, I'm explicitly stating the id (int_column1) for efficiency.
A short example code would be great
Here's a solution that actually uses executemany() !
Basically the idea in the example here will work.
But note that in Django, you need to use the %s placeholder rather than the question mark.
Also, you will want to manage your transactions. I'll not get into that here as there is plenty of documentation available.
from django.db import connection,transaction
cursor = connection.cursor()
query = ''' INSERT INTO table_name
(var1,var2,var3)
VALUES (%s,%s,%s) '''
query_list = build_query_list()
# here build_query_list() represents some function to populate
# the list with multiple records
# in the tuple format (value1, value2, value3).
cursor.executemany(query, query_list)
transaction.commit()
are you serisouly suggesting loading billions of rows (sorry instances) of data via some ORM data access layer - how long do you have ?
bulk load if possible - http://dev.mysql.com/doc/refman/5.1/en/load-data.html
If you need to modify the data, bulk load with load data into a temporary table as is. Then apply modifications with an insert into select command. IME, this is by far the fastest way to get a lot of data into a table.
I'm not sure how to use the executemany() command, but you can use a single SQL INSERT statement to insert multiple records