Use Ruby To Convert Any Query To Count Query - sql

In my application I use PG to execute queries defined by the user in-app.
require 'pg'
database = PG.connect(*credentials)
query = 'select id, created_at from users where id % 2 = 0'
database.connection.exec(query)
Part of the application requires fetching a count before running the actual query so I use regex to convert the query to a count query. (Assume LIMIT and ORDER BY are not allowed)
query = 'select id, created_at from users where id % 2 = 0'
query.gsub!(%r{(?<=SELECT)[^\/]+(?=FROM)}, ' count(*) ')
count = database.exec(query).first['count'].to_i
But if the query includes CTE's and/or sub-queries...
query = 'with new_table as (select id from users where id % 2 = 0)
select created_at, name from users where id in (select * from new_table)'
the above regex doesn't work, and I haven't been able to figure out another regex based solution.
Using SQL, Ruby, or REGEX, how could I convert any query a read-only db user could perform into a count query WITHOUT wrapping the query in its own CTE or just running the query and counting the results?
More simply, given a query, how can one get the row count for that query without actually running the full query?
Any engineers at Looker, PeriscopeData, or Mode should have this one in the bag :-)

Modifying SQL queries with a regex is a non-starter for all the reasons you don't try to modify XML with a regex: you need something which understands the grammar. What you're looking for is a SQL Query Builder.
A SQL Query Builder is sort of like an ORM without the ORM. You use it to write SQL queries using method calls, not strings, but you don't have to tell it what all your tables and columns are like an ORM, nor do you have to make classes for all your tables. It just makes SQL queries.
Your query is held as an object and only turned into SQL when it's time to communicate with the database. If you want to modify the query, you do it with method calls and regenerate the SQL. So you can add where clauses and group bys and limits and add more rows to select and join tables and, yes, count.
They also will often smooth over SQL incompatibilities for you, so the same code can run on MySQL or SQLite or Postgresql or Oracle.
A good non-Ruby one is Knex.js in Javascript. I'm having a hard time finding a pure SQL Query Builder for Ruby. The ones I've found (ActiveRecord, Sequel, squeel, and ARel) are all ORMs and require you to set up classes and schemas and all that. The only one I could find for Ruby that isn't an ORM is qdsl.

Easy way: create a new query from your existing one by just surrounding it to convert it into a subquery:
require 'pg'
database = PG.connect(*credentials)
query = 'select id, created_at from users where id % 2 = 0'
# Create a `count` query, based on the existing one.
# The original query must NOT end with ';'
count_query = 'SELECT count(*) AS count FROM (' + query + ') AS q0'
database.connection.exec(count_query)
# Follow on
database.connection.exec(query)
This may not be the most efficient way of getting the count, but I think it is the simplest and (probably) the less error-prone. It assumes that the original query is well-formed, and that it does not call functions with side-effects [a practice that should be reserved for very few use-cases].
Assuming the users table is similar to:
CREATE TABLE users AS
SELECT * FROM
(VALUES
(1::integer, 'name1'::text, now()::timestamp),
(2, 'name2', now() - interval '1 hour'),
(3, 'name3', now() - interval '2 hours'),
(4, 'name4', now() - interval '2 hours')
) x(id, name, created_at) ;
This solution works with WITH clases, because the following SQL query is legal:
SELECT count(*) FROM
(
with new_table as (select id from users where id % 2 = 0)
select created_at, name from users where id in (select * from new_table)
) AS q0 ;
... and returns 2 as expected.

Related

How to use multiple count distinct in the same query with other columns in Druid SQL?

I'm trying to use three projections in same query like below in a Druid environment:
select
__time,
count(distinct col1),
count(distinct case when (condition1 and condition2 then (concat(col2,TIME_FORMAT(__time))) else 0 end )
from table
where condition3
GROUP BY __time
But instead I get an error saying - Unknown exception / Cannot build plan for query
It seems to work perfectly fine when I put just one count(distinct) in the query.
How can this be resolved?
As a workaround, you can do multiple subqueries and join them. Something like:
SELECT x.__time, x.delete_cnt, y.added_cnt
FROM
(
SELECT FLOOR(__time to HOUR) __time, count(distinct deleted) delete_cnt
FROM wikipedia
GROUP BY 1
)x
JOIN
(
SELECT FLOOR(__time to HOUR) __time, count( distinct added) added_cnt
FROM wikipedia
GROUP BY 1
)y ON x.__time = y.__time
As the Druid documentation points out:
COUNT(DISTINCT expr) Counts distinct values of expr, which can be string, numeric, or hyperUnique. By default this is approximate, using a variant of HyperLogLog. To get exact counts set "useApproximateCountDistinct" to "false". If you do this, expr must be string or numeric, since exact counts are not possible using hyperUnique columns. See also APPROX_COUNT_DISTINCT(expr). In exact mode, only one distinct count per query is permitted.
So this is a Druid limitation: you either need to disable exact mode, or else limit yourself to one distinct count per query.
On a side note, other databases typically do not have this limitation. Apache Druid is designed for high performance real-time analytics, and as a result, its implementation of SQL has some restrictions. Internally, Druid uses a JSON-based query language. The SQL interface is powered by a parser and planner based on Apache Calcitea, which translates SQL into native Druid queries.

Get count and result from SQL query in Go

I'm running a pretty straightforward query using the database/sql and lib/pq (postgres) packages and I want to toss the results of some of the fields into a slice, but I need to know how big to make the slice.
The only solution I can find is to do another query that is just SELECT COUNT(*) FROM tableName;.
Is there a way to both get the result of the query AND the count of returned rows in one query?
Conceptually, the problem is that the database cursor may not be enumerated to the end so the database does not really know how many records you will get before you actually read all of them. The only way to count (in general case) is to go through all the records in the resultset.
But practically, you can enforce it to do so by using subqueries like
select *, (select count(*) from table) from table
and just ignore the second column for records other than first. But it is very rude and I do not recommend doing so.
Not sure if this is what you are asking for but you can call the ##Rowcount function to return the count of the previous select statement that has been executed.
SELECT mytable.mycol FROM mytable WHERE mytable.foo = 'bar'
SELECT ##Rowcount
If you want the row count included in your result set you can use the the OVER clause (MSDN)
SELECT mytable.mycol, count(*) OVER(PARTITION BY mytable.foo) AS 'Count' FROM mytable WHERE mytable.foo = 'bar'
You could also perhaps just separate two SQL statements with the a ; . This would return a result set of both statements executed.
You would used count(*)
SELECT count(distinct last)
FROM (XYZTable)
WHERE date(FROM_UNIXTIME(time)) >= '2013-10-28' AND
id = 90 ;

counting rows in select clause with DB2

I would like to query a DB2 table and get all the results of a query in addition to all of the rows returned by the select statement in a separate column.
E.g., if the table contains columns 'id' and 'user_id', assuming 100 rows, the result of the query would appear in this format: (id) | (user_id) | 100.
I do not wish to use a 'group by' clause in the query. (Just in case you are confused about what i am asking) Also, I could not find an example here: http://mysite.verizon.net/Graeme_Birchall/cookbook/DB2V97CK.PDF.
Also, if there is a more efficient way of getting both these results (values + count), I would welcome any ideas. My environment uses zend framework 1.x, which does not have an ODBC adapter for DB2. (See issue http://framework.zend.com/issues/browse/ZF-905.)
If I understand what you are asking for, then the answer should be
select t.*, g.tally
from mytable t,
(select count(*) as tally
from mytable
) as g;
If this is not what you want, then please give an actual example of desired output, supposing there are 3 to 5 records, so that we can see exactly what you want.
You would use window/analytic functions for this:
select t.*, count(*) over() as NumRows
from table t;
This will work for whatever kind of query you have.

HQL count from multiple tables

I would like to query my database using a HQL query to retrieve the total number of rows having a MY_DATE greater than SOME_DATE.
So far, I have come up with a native Oracle query to get that result, but I am stuck when writing in HQL:
SELECT
(
SELECT COUNT(MY_DATE)
FROM Table1
WHERE MY_DATE >= TO_DATE('2011-09-07','yyyy-MM-dd')
)
+
(
SELECT COUNT(MY_DATE)
FROM Table2
WHERE MY_DATE >= TO_DATE('2011-09-07','yyyy-MM-dd')
)
AS total
I actually have more than 2 tables but I keep having an IllegalArgumentException (unexpected end of subtree).
The working native Oracle basically ends with FROM dual.
What HQL query should I use to get the total number of rows I want?
First of, if you have a working SQL query, why not just use that instead of trying to translate it to HQL? Since you're returning a single scalar in the first place, it's not like you need anything HQL provides (e.g. dependent entities, etc...)
Secondly, do you have 'dual' mapped in Hibernate? :-) If not, how exactly are you planning on translating that?
That said, "unexpected end of subtree" error is usually caused by idiosyncrasies of Hibernate's AST parser. A commonly used workaround is to prefix the expression with '0 +':
select 0 + (
... nested select #1 ...
) + (
... nested select #2 ...
) as total
from <from what exactly?>

SQL select groups of distinct items in prepared statement?

I have a batch job that I run on a table which I'm sure I could write as a prepared statement. Currently it's all in Java and no doubt less efficient than it could be. For a table like so:
CREATE TABLE thing (
`tag` varchar,
`document` varchar,
`weight` float,
)
I want to create a new table that contains the top N entries for every tag. Currently I do this:
create new table with same schema
select distinct tag
for each tag:
select * limit N insert into the new table
This requires executing a query to get the distinct tags, then selecting the top N items for that tag and inserting them... all very inefficient.
Is there a stored procedure (or even a simple query) that I could use to do this? If dialect is important, I'm using MySQL.
(And yes, I do have my indexes sorted!)
Cheers
Joe
I haven't done this in a while (spoiled by CTE's in SQL Server), and I'm assuming that your data is ordered by weight; try
SELECT tag, document, weight
FROM thing
WHERE (SELECT COUNT(*)
FROM thing as t
WHERE t.tag = thing.tag AND t.weight < thing.weight
) < N;
I think that will do it.
EDIT: corrected error in code; need < N, not <= N.
If you were using SQL Server, I would suggest using the ROW_NUMBER function, grouped by tag, and select where row_number < N. (So in other words, order and number the rows for each tag according to their position in the tag group, then pick the top N rows from each group.) I found an article about simulating the ROW_NUMBER function in MySQL here:
http://www.xaprb.com/blog/2006/12/02/how-to-number-rows-in-mysql/
See if this helps you out!