Analytical Function Of Oracle vs MariaDB - sql

Currently migrating my Oracle queries to MariaDB 10.4
I have a hard time in analytical function.
MARIADB Code:
select cgi, timestamp, hour, rat_type, dl_tput,
ntile(24) over (partition by timestamp,rat_type order by dl_tput) as dl_tput_ntiled
from (select cgi, date(timestamp) as timestamp,
date_format(timestamp,'%H') as hour, rat_type, avg(avg_mean_down) as dl_tput
from JUST_TEST_A
where avg_mean_down is not null
group by cgi, date(timestamp),date_format(timestamp,'%H'),rat_type
) x ;
This code works fine, but after validating the output the result from Oracle is different from the result of MariaDB (same data)
My oracle script have this script that I’ve removed in mariadb.
select cgi, timestamp, hour, rat_type, dl_tput,
ntile(24) over (partition by timestamp,rat_type order by dl_tput) as dl_tput_ntiled,
count(*) over () as dl_tput_cnt
from (...)
Does count(*) over () affects my output? What is the alternative query for MariaDB of this analytical function?

In this DEMO here:
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=be74a9fa9059d6b80a8cb10d102355d4
you will find a small example where I have entered some data and there is a query :
select pk
, a
, b
, ntile(24) over (partition by a, b order by pk)
, count(*) over () as dl_tput_cnt
from t1;
In the upper left corner you have a list of different databases that you can choose from. If you choose Oracle for this example or if you choose Maria DB the result will be the same. This is not the guarantee that count and ntile are not responsible for your different results but please do tell us more about this results:
but after validating the output the result from Oracle is different
from the result of MariaDB (same data)
Maybe this will help more for your case.
One more thing.
I think you should check the results you are getting from Maria DB for this query's:
SELECT date(timestamp) FROM JUST_TEST_A
SELECT date_format(timestamp,'%H') FROM JUST_TEST_A
and compare them with the old results you received from Oracle for Oracle's equivalent functions for date and date_format.

I suspect that this is an issue with ties. ntile() will split rows with the same groups across different buckets, because its mandate is for each bucket to be the same size.
As an extreme example, if all the values of dl_tput were the same, then any value form 1 to 24 could be assigned.

Related

How to use multiple count distinct in the same query with other columns in Druid SQL?

I'm trying to use three projections in same query like below in a Druid environment:
select
__time,
count(distinct col1),
count(distinct case when (condition1 and condition2 then (concat(col2,TIME_FORMAT(__time))) else 0 end )
from table
where condition3
GROUP BY __time
But instead I get an error saying - Unknown exception / Cannot build plan for query
It seems to work perfectly fine when I put just one count(distinct) in the query.
How can this be resolved?
As a workaround, you can do multiple subqueries and join them. Something like:
SELECT x.__time, x.delete_cnt, y.added_cnt
FROM
(
SELECT FLOOR(__time to HOUR) __time, count(distinct deleted) delete_cnt
FROM wikipedia
GROUP BY 1
)x
JOIN
(
SELECT FLOOR(__time to HOUR) __time, count( distinct added) added_cnt
FROM wikipedia
GROUP BY 1
)y ON x.__time = y.__time
As the Druid documentation points out:
COUNT(DISTINCT expr) Counts distinct values of expr, which can be string, numeric, or hyperUnique. By default this is approximate, using a variant of HyperLogLog. To get exact counts set "useApproximateCountDistinct" to "false". If you do this, expr must be string or numeric, since exact counts are not possible using hyperUnique columns. See also APPROX_COUNT_DISTINCT(expr). In exact mode, only one distinct count per query is permitted.
So this is a Druid limitation: you either need to disable exact mode, or else limit yourself to one distinct count per query.
On a side note, other databases typically do not have this limitation. Apache Druid is designed for high performance real-time analytics, and as a result, its implementation of SQL has some restrictions. Internally, Druid uses a JSON-based query language. The SQL interface is powered by a parser and planner based on Apache Calcitea, which translates SQL into native Druid queries.

how to group by sql data in a sub-query to show evolution

I have my sqlite database which contains urls and the number of times it has been visited per week. it's stored this way :
uid, url, date, nb_visits,...
I would like to get every single URL with the evolution of the number of visits grouped by date.
something that could looks like :
"my_url_1","45,54,76,36,78"
here my assumption is that there are 5 dates stored into the db, don't need the dates, just want them to be ordered from old to recent
I tried something like this, but don't accept 2 field in the second select
select url, visits,
(select sum(visits) from data d2 where d2.url=d1.url group by date) as evol
from data d1
where date = (select max(date) from data);
This query isn't working just wanted to share what i'm trying to do
Thanks for the help
What you want is the result of GROUP_CONCAT() for each url.
The problem with the aggregate function GROUP_CONCAT() is that it does not support an ORDER BY clause, which you need to sort the visits by date, so the result that you would get is not guaranteed to be correct.
In SQLite there is also a GROUP_CONCAT() window function which supports an ORDER BY clause and can be used to return the result that you want:
SELECT DISTINCT url,
GROUP_CONCAT(visits) OVER (
PARTITION BY url
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) visits
FROM data
See a simplified demo.
Are you looking for group_concat()?
select url, group_concat(visits)
from (select d.*
from data d
order by url, date
) d
group by url;
Note: SQLite doesn't support an order by as part of the group_concat() syntax. In theory, sorting in the subquery should have the desired effect of ordering the dates.

T-SQL Query to SELECT rows with same values of several columns (Azure SQL Database)

I need help with writing a T-SQL query on a table shown on the picture below. The table has ambiguous info about buildings, some of them appears more then one time, that is wrong. I need to select only rows that has the same street and building values, for I can manually delete bad rows then. So I want to select rows 1,2,4,5 on the picture below. I use an Azure SQL Database, it has some limitations on T-SQL.
I'm pretty sure Azure supports subqueries and window functions. So, try this:
select t.*
from (select t.*, count(*) over (partition by street, building) as cnt
from table t
) t
where cnt > 1;

counting rows in select clause with DB2

I would like to query a DB2 table and get all the results of a query in addition to all of the rows returned by the select statement in a separate column.
E.g., if the table contains columns 'id' and 'user_id', assuming 100 rows, the result of the query would appear in this format: (id) | (user_id) | 100.
I do not wish to use a 'group by' clause in the query. (Just in case you are confused about what i am asking) Also, I could not find an example here: http://mysite.verizon.net/Graeme_Birchall/cookbook/DB2V97CK.PDF.
Also, if there is a more efficient way of getting both these results (values + count), I would welcome any ideas. My environment uses zend framework 1.x, which does not have an ODBC adapter for DB2. (See issue http://framework.zend.com/issues/browse/ZF-905.)
If I understand what you are asking for, then the answer should be
select t.*, g.tally
from mytable t,
(select count(*) as tally
from mytable
) as g;
If this is not what you want, then please give an actual example of desired output, supposing there are 3 to 5 records, so that we can see exactly what you want.
You would use window/analytic functions for this:
select t.*, count(*) over() as NumRows
from table t;
This will work for whatever kind of query you have.

Aggregate functions in WHERE clause in SQLite

Simply put, I have a table with, among other things, a column for timestamps. I want to get the row with the most recent (i.e. greatest value) timestamp. Currently I'm doing this:
SELECT * FROM table ORDER BY timestamp DESC LIMIT 1
But I'd much rather do something like this:
SELECT * FROM table WHERE timestamp=max(timestamp)
However, SQLite rejects this query:
SQL error: misuse of aggregate function max()
The documentation confirms this behavior (bottom of page):
Aggregate functions may only be used in a SELECT statement.
My question is: is it possible to write a query to get the row with the greatest timestamp without ordering the select and limiting the number of returned rows to 1? This seems like it should be possible, but I guess my SQL-fu isn't up to snuff.
SELECT * from foo where timestamp = (select max(timestamp) from foo)
or, if SQLite insists on treating subselects as sets,
SELECT * from foo where timestamp in (select max(timestamp) from foo)
There are many ways to skin a cat.
If you have an Identity Column that has an auto-increment functionality, a faster query would result if you return the last record by ID, due to the indexing of the column, unless of course you wish to put an index on the timestamp column.
SELECT * FROM TABLE ORDER BY ID DESC LIMIT 1
I think I've answered this question 5 times in the past week now, but I'm too tired to find a link to one of those right now, so here it is again...
SELECT
*
FROM
table T1
LEFT OUTER JOIN table T2 ON
T2.timestamp > T1.timestamp
WHERE
T2.timestamp IS NULL
You're basically looking for the row where no other row matches that is later than it.
NOTE: As pointed out in the comments, this method will not perform as well in this kind of situation. It will usually work better (for SQL Server at least) in situations where you want the last row for each customer (as an example).
you can simply do
SELECT *, max(timestamp) FROM table
Edit:
As aggregate function can't be used like this so it gives error. I guess what SquareCog had suggested was the best thing to do
SELECT * FROM table WHERE timestamp = (select max(timestamp) from table)