Need nosql database for queries with bitwise condition - sql

I am currently using apache cassandra database for storing information.
But cassandra does not allow to perform queries with bitwise operations.
I need to execute query:
select count(*) from table where field1 = ? and BIT_COUNT(field2 ^ ?) <= 10;
But cassandra does not allow it.
Can you advice some nosql or embedded fast sql solution?
Database contains greater than 1 million rows.

If you're happy with Cassandra otherwise, you could add Spark and use Spark SQL to do queries like that. Spark has an open-source connector to use Cassandra as its distributed database.
There's also DataStax Enterprise which would allow you to integrate with Hadoop/Hive and get similar analytic capabilities. (DataStax Enterprise is also an easy way to get Spark functionality.)

Related

Azure SQL Update Performance Indexed

On Azure SQL Database:
UPDATE SomeLargeTable
SET [nonPKbutIndexedColumn] = newValue
WHERE [nonPKbutIndexedColumn] = value;
UPDATE SomeLargeTable
SET [nonPKbutIndexedColumn] = newValue
WHERE [PKcolumn] IN (SELECT [PKcolumn] FROM SomeLargeTable
WHERE [nonPKbutIndexedColumn] = value);
What about the performance of these queries? Other suggestions also are welcome...
The performance of any Data Manipulation Language (DML) command depends on many factors like volume of data in the tables, how efficiently the schema is designed, etc.
As long as your tables are properly indexed, both queries will run fine. There shouldn't be any performance issue. You can check the time taken for the query at the bottom of Query Editor in Azure SQL Database.
Additionally, you can use Query Performance Insight in Azure SQL Database which provides intelligent query analysis for single and pooled databases. It helps identify the top resource consuming and long-running queries in your workload. This helps you find the queries to optimize to improve overall workload performance and efficiently use the resource.

Raw SQL on Airflow

I'd prefer to use raw SQL ( Mainly select + insert ) instead of O/R mapper, because it would be difficult to perform queries.
(The RDBMS is postgres9.4)
So the question is
Can I use raw SQL for logic part in Airflow?
You can create tasks that run raw SQL in a Postgres database directly using the PostgresOperator. You will need to set up your Postgres database as a Connection object in order for Airflow to know how to connect to the database.

How can I use PL Sql in Hive using Spark?

val hiveContext = new HiveContext(sc)
val s = hiveContext.sql("SELECT * FROM Test")
But don't know how to use PL SQL in hive. Please help me.
It does not make sense to use PL/SQL code in hivecontext.sql() as it requires a querystring and not procedure.
The method returns a new data frame and would not perform operations as usually done in an PL/SQL code.
https://spark.apache.org/docs/1.3.0/api/java/org/apache/spark/sql/hive/HiveContext.html
It appears the answer is "yes", which I found in about 20 seconds by googling "hive spark pl/sql". And it has a reference manual here
HPL/SQL is an open source tool (Apache License 2.0) that implements
procedural SQL language for Apache Hive, SparkSQL, Impala as well as
any other SQL-on-Hadoop implementation, any NoSQL and any RDBMS.
HPL/SQL is a hybrid and heterogeneous language that understands
syntaxes and semantics of almost any existing procedural SQL dialect,
and you can use with any database, for example, running existing
Oracle PL/SQL code on Apache Hive and Microsoft SQL Server, or running
Transact-SQL on Oracle, Cloudera Impala or Amazon Redshift.
HPL/SQL
language is compatible to a large extent with Oracle PL/SQL, ANSI/ISO
SQL/PSM (IBM DB2, MySQL, Teradata i.e), PostgreSQL PL/pgSQL (Netezza),
Transact-SQL (Microsoft SQL Server and Sybase) that allows you
leveraging existing SQL/DWH skills and familiar approach to implement
data warehouse solutions on Hadoop. It also facilitates migration of
existing business logic to Hadoop. HPL/SQL is an efficient way to
implement ETL processes in Hadoop
.

Do databases besides Postgres have features comparable to foreign data wrappers?

I'm very excited by several of the more recently-added Postgres features, such as foreign data wrappers. I'm not aware of any other RDBMS having this feature, but before I try to make the case to my main client that they should begin preferring Postgres over their current cocktail of RDBMSs, and include in my case that no other database can do this, I'd like to verify that.
I've been unable to find evidence of any other database supporting SQL/MED, and things like this short note stating that Oracle does not support SQL/MED.
The main thing that gives me doubt is a statement on http://wiki.postgresql.org/wiki/SQL/MED:
SQL/MED is Management of External Data, a part of the SQL standard that deals with how a database management system can integrate data stored outside the database.
If FDWs are based on SQL/MED, and SQL/MED is an open standard, then it seems likely that other RDBMSs have implemented it too.
TL;DR:
Does any database besides Postgres support SQL/MED?
IBM DB2 claims compliance with SQL/MED (including full FDW API);
MySQL's FEDERATED storage engine can connect to another MySQL database, but NOT to other RDBMSs;
MariaDB's CONNECT engine allows access to various file formats (CSV, XML, Excel, etc), gives access to "any" ODBC data sources (Oracle, DB2, SQLServer, etc) and can access data on the storage engines MyIsam and InnoDB.
Farrago has some of it too;
PostgreSQL implements parts of it (notably it does not implement routine mappings, and has a simplified FDW API). It is usable as readeable since PG 9.1 and writeable since 9.3, and prior to that there was the DBI-Link.
PostgreSQL communities have a plenty of nice FDW like noSQL FDW (couchdb_fdw, mongo_fdw, redis_fdw), Multicorn (for using Python output instead of C for the wrapper per se), or the nuts PGStrom (which uses GPU for some operations!)
SQL Server has the concept of Linked Servers (http://technet.microsoft.com/en-us/library/ms188279.aspx), which allows you to connect to external data sources (Oracle, other SQL instances, Active Directory, File system data via the Indexing Service provider, etc.) and, if you really needed to, you can create your own Providers that can be used by a SQL Server Linked Server.
Another option within SQL Server is the CLR, in which you can write code to retrieve data from web services or other data sources as needed.
While this may not technically be "SQL/MED", it seems to accomplish the same thing.
Distributed query using local table joined to 4-part linked server query. I think case the remotetable filter might not be applied until after the entire table is pulled local (documentation is fuzzy on this and I've found article with conflicting opinions):
SELECT *
FROM LocalDB.dbo.table t
INNER JOIN LinkedServer1.RemoteDB.dbo.remotetable r on t.val = r.val
WHERE r.val < 1000
;
Using OpenQuery, remotetable filter is applied on the remote server, as long as the filter is passed into the OpenQuery 2nd parameter:
SELECT *
FROM LocalDB.dbo.table t
INNER JOIN OPENQUERY(LinkedServer1, 'SELECT * FROM RemoteDB.dbo.remotetable r WHERE r.val < 1000') r on t.val = r.val

How to perform the and operation in redis like mysql?

Select * from student where id=1 and college='sss' so how it will achieved in redis databas.
What you are describing is a query you could perform on an SQL database.
Redis however is a key-value store. Therefore it does not work with SQL queries.
I suggest trying the interactive tutorial to get an idea.
You'd have to maintain your own index, for example in a HSET, or use a package like stdnet, which does the indexing for you.