How to write riak query in riakc - sql

I am a noob at riak, and have been trying to test the query aspect of riak using riakc in erlang. but I can not find any example of how to query the database that match the old SQL way, Only on how to get a single value out of a single field. I think I am missing something but all I really want is just a standard SQL query with a matching riakc code.
SELECT * FROM bucket;
SELECT * FROM bucket LIMIT 10, 100;
SELECT id, name FROM bucket;
SELECT * FROM bucket WHERE name="john" AND surname LIKE "Ste%";
SELECT * FROM bucket LEFT JOIN bucket2 ON bucket.id = bucket2.id2;
I assume that there is no direct correlation on how you write these, but was hoping there is a standard way, and is there somewhere that has a a simple to understand way of explaining this querys in riakc (or even just riak).
I have looked a mapreduce but found that it was confusing for just simple queries

Riak is a NoSQL database, more specifically a key-value database, and there is no query language like SQL available. When working with Riak you need to model and query your data in a completely different way compared to how you use a relational database in order to get the most from it. Trying to model and query your data in a relational manner, e.g. by extensive use of secondary indexes or by trying to use map/reduce as a real-time query language, generally results in very poor performance and scalability. A good and useful discussion about Riak development anti-patterns that can be found here.

Related

What is possible in SOQL but not in SQL?

Are there any functions / keywords / syntax in SOQL queries that does NOT have an equivalent operation in SQL?
Basically, does there exist a SOQL query that you couldn't convert directly into a SQL query?
Weird question, why do you ask? And which SQL you mean exactly, Oracle, SQL Server flavour, Maria DB or what?
I'd say you'll have hard time
mapping SELECT Account.Owner.Manager.Profile.Name FROM Opportunity into "normal" joins
replicating TOLABEL() (translate picklist values on the fly)
replicating anything SF-specific like WITH (say knowledge base's data categories) or USING SCOPE (you can pull "my accounts" but can you pull "my team's accounts"? "My territory's accounts"? Without an orgy of joins)
doing joins over mutant polymorphic fields like Task.WhatId or ContentDocumentLink.LinkedEntityId
doing any kind of SOSL, especially if org uses synonyms
converting currencies on the fly
doing things like FISCAL_YEAR() without orgy of joins to Period table
replicating any geolocation-related queries (accounts up to 10 km away from me) without knowing exactly what GIS (or whatever) SF uses
doing soft deletes (or however Recycle Bin really works) without impact on performance. I mean if records go to another table - columns are identical and join/view happens magically when you query ALL ROWS?
doing any Person Account stuff, silently querying and updating effectively 2 tables (as materialised view maybe?)
Some differences of SOQL:
No SELECT *
No views
SOQL read-only
Limited indexes
Object-relational mapping is automatic
Schema changes protected

How can I perform the same query on multiple tables in Redshift

I'm working in SQL Workbench in Redshift. We have daily event tables for customer accounts, the same format each day just with updated info. There are currently 300+ tables. For a simple example, I would like to extract the top 10 rows from each table and place them in 1 table.
Table name format is Events_001, Events_002, etc. Typical values are Customer_ID and Balance.
Redshift does not appear to support declare variables, so I'm a bit stuck.
You've effectively invented a kind of pseudo-partitioning; where you manually partition the data by day.
To manually recombine the tables create a view to union everything together...
CREATE VIEW
events_combined
AS
SELECT 1 AS partition_id, * FROM events_001
UNION ALL
SELECT 2 AS partition_id, * FROM events_002
UNION ALL
SELECT 3 AS partition_id, * FROM events_003
etc, etc
That's a hassle, you need to recreate the view every time you add a new table.
That's why most modern databases have partitioning schemes built in to them, so all the boiler-plate is taken care of for you.
But RedShift doesn't do that. So, why not?
In general because RedShift has many alternative mechanisms for dividing and conquering data. It's columnar, so you can avoid reading columns you don't use. It's horizontally partitioned across multiple nodes (sharded), to share the load with large volumes of data. It's sorted and compressed in pages to avoid loading rows you don't want or need. It has dirty pages for newly arriving data, which can then be cleaned up with a VACUUM.
So, I would agree with others that it's not normal practice. Yet, Amazon themselves do have a help page (briefly) describing your use case.
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-time-series-tables.html
So, I'd disagree with "never do this". Still, it is a strong indication that you've accidentally walked in to an anti-pattern and should seriously re-consider your design.
As others have pointed out many small tables in Redshift is really inefficient, like terrible if taken to the extreme. But that is not your question.
You want to know how to perform the same query on multiple tables from SQL Workbench. I'm assuming you are referring to SQLWorkbench/J. If so you can define variables in the bench and use these variable in queries. Then you just need to update the variable and rerun the query. Now SQLWorkbench/J doesn't offer any looping or scripting capabilities. If you want to loop you will need to wrap the bench in a script (like a BAT file or a bash script).
My preference is to write a jinja template with the SQL in it along with any looping and variable substitution. Then apply a json with the table names and presto you have all the SQL for all the tables in one file. I just need to run this - usually with the psql cli but at times I'm import it into my bench.
My advice is to treat Redshift as a query execution engine and use an external environment (Lambda, EC2, etc) for the orchestration of what queries to run and when. Many other databases (try to) provide a full operating environment inside the database functionality. Applying this pattern to Redshift often leads to problems. Use Redshift for what it is great at and perform the other actions elsewhere. In the end you will find that the large AWS ecosystem provides extended capabilities as compared to other databases, it's just that these aren't all done inside of Redshift.

Big Query takes too long for simple query

Simple count query takes amazingly long time to accomplish.
Am I doing something wrong?
SELECT COUNT(*) FROM `TABLE`
(if someone from bigquery hears this:
jobid: southamerica-east1.bquxjob_6a6df24d_16dfdbe0b54)
There are multiple reasons for a query running slow in BigQuery, as mentioned in the comments - if your table is an external table, that might cause an issue as well. If timing is critical for you and the queries you have are extremely simple, you might want to consider using Cloud SQL which is a realtime database.
BigQuery is normally used for larger more complex queries over very large datasets. If you have a support package, you might want to reach the Google Cloud Support team to have a look at the query to understand why it is running so slow.
Another workaround, just in case you only want to know the number of rows, could be to query the metadata:
SELECT table_id, row_count, size_bytes
FROM `PROJECT_ID.DATASET.__TABLES__` where table_id = 'your_table'

Cassandra CQL - NoSQL or SQL

I am pretty new to Cassandra, just started learning Cassandra a week ago.
I first read, that it was a NoSQL, but when i started using CQL,
I started to wonder whether Cassandra is a NoSQL or SQL DB?
Can someone explain why CQL is more or less like SQL?
CQL is declarative like SQL and the very basic structure of the query component of the language (select things where condition) is the same. But there are enough differences that one should not approach using it in the same way as conventional SQL.
The obvious items: 1. There are no joins or subqueries. 2. No transactions
Less obvious but equally important to note:
Except for the primary key, you can only apply a WHERE condition on a column if you have created an index on that column. In SQL, you don't have to index a column to filter on it but in CQL the select statement will fail outright.
There are no OR or NOT logical operators, only AND. It is very important to model your data so you won't need these two; it is very easy to accidentally forget.
Date handling is profoundly different. CQL permits ONLY the equal operator for timestamps so extremely common and useful expressions like this do not work: where dateField > TO_TIMESTAMP('2013-01-01','YYYY-MM-DD') Also, CQL does not permit string insert of dates accurate to millis (seconds only) -- but it does permit entry of millis since epoch as a long int -- which most other DB engines do NOT permit. Lastly, timezone (as GMT offset) is invisibly captured for both long millis and string formats without a timezone. This can lead to confusion for those systems that deliberately do not conflate local time + GMT offset.
You can ONLY update a table based on primary key (or an IN list of primary keys). You cannot update based on other column data, nor can you do a mass update like this: update table set field = value; CQL demands a where clause with the primary key.
Grammar for AND does not permit parens. TO be fair, it's not necessary because of the lack of the OR operator but this means traditional SQL rewriters that add "protective" parens around expressions will not work with CQL, e.g.: select * from www where (str1 = 'foo2') and (dat1 = 12312442);
In general, it is best to use Cassandra as a big, resilient permastore of data for which a small number of very high level, very high performance queries can be applied to drag out a subset of data to work with at the application layer. That subset might be 1 million rows, yes. CQL and the Cassandra model is not designed for 2 page long SELECT statements with embedded cases, aggregations, etc. etc.
For all intents and purposes, CQL is SQL, so in the strictest sense Cassandra is an SQL database. However, most people closely associate SQL with the relational databases it is usually applied to. Under this (mis)interpretation, Cassandra should not be considered an "SQL database" since it is not relational, and does not support ACID properties.
Docs for CQLV3.0
CQL DESCRIBE to get schema of keyspace, column family, cluster
CQL Doesn't support some stuffs I had known in SQL like joins group by triggers cursors procedure transactions stored procedures
CQL3.0 Supports ORDER BY
CQL Supports all DML and DDL functionalities
CQL Supports BATCH
BATCH is not an analogue for SQL ACID transactions.
Just the DOC mentioned above is a best reference :)

Has anyone written a higher level query langage (than sql) that generates sql for common tasks, on limited schemas

Sql is the standard in query languages, however it is sometime a bit verbose. I am currently writing limited query language that will make my common queries quicker to write and with a bit less mental overhead.
If you write a query over a good database schema, essentially you will be always joining over the primary key, foreign key fields so I think it should be unnecessary to have to state them each time.
So a query could look like.
select s.name, region.description from shop s
where monthly_sales.amount > 4000 and s.staff < 10
The relations would be
shop -- many to one -- region,
shop -- one to many -- monthly_sales
The sql that would be eqivilent to would be
select distinct s.name, r.description
from shop s
join region r on shop.region_id = region.region_id
join monthly_sales ms on ms.shop_id = s.shop_id
where ms.sales.amount > 4000 and s.staff < 10
(the distinct is there as you are joining to a one to many table (monthly_sales) and you are not selecting off fields from that table)
I understand that original query above may be ambiguous for certain schemas i.e if there the two relationship routes between two of the tables. However there are ways around (most) of these especially if you limit the schema allowed. Most possible schema's are not worth considering anyway.
I was just wondering if there any attempts to do something like this?
(I have seen most orm solutions to making some queries easier)
EDIT: I actually really like sql. I have used orm solutions and looked at linq. The best I have seen so far is SQLalchemy (for python). However, as far as I have seen they do not offer what I am after.
Hibernate and LinqToSQL do exactly what you want
I think you'd be better off spending your time just writing more SQL and becoming more comfortable with it. Most developers I know have gone through just this progression, where their initial exposure to SQL inspires them to bypass it entirely by writing their own ORM or set of helper classes that auto-generates the SQL for them. Usually they continue adding to it and refining it until it's just as complex (if not more so) than SQL. The results are sometimes fairly comical - I inherited one application that had classes named "And.cs" and "Or.cs", whose main functions were to add the words " AND " and " OR ", respectively, to a string.
SQL is designed to handle a wide variety of complexity. If your application's data design is simple, then the SQL to manipulate that data will be simple as well. It doesn't make much sense to use a different sort of query language for simple things, and then use SQL for the complex things, when SQL can handle both kinds of thing well.
I believe that any (decent) ORM would be of help here..
Entity SQL is slightly higher level (in places) than Transact SQL. Other than that, HQL, etc. For object-model approaches, LINQ (IQueryable<T>) is much higher level, allowing simple navigation:
var qry = from cust in db.Customers
select cust.Orders.Sum(o => o.OrderValue);
etc
Martin Fowler plumbed a whole load of energy into this and produced the Active Record pattern. I think this is what you're looking for?
Not sure if this falls in what you are looking for but I've been generating SQL dynamically from the definition of the Data Access Objects; the idea is to reflect on the class and by default assume that its name is the table name and all properties are columns. I also have search criteria objects to build the where part. The DAOs may contain lists of other DAO classes and that directs the joins.
Since you asked for something to take care of most of the repetitive SQL, this approach does it. And when it doesn't, I just fall back on handwritten SQL or stored procedures.