ORA-01427 - Need the counts of each value - sql

I get "ORA-01427: single-row subquery returns more than one row" when I run the following query:
select count(*)
from table1
where to_char(timestamp,'yyyymmddhh24') = to_char(sysdate-1/24,'yyyymmddhh24')
and attribute = (select distinct attribute from table2);
I want to get the counts of each value of attribute in the specific time frame.

I would recommend writing this as:
select count(*)
from table1 t1
where timestamp >= trunc(sysdate-1/24, 'HH') and
timestamp < trunc(sysdate, 'HH') and
exists (select 1 from table2 t2 where t2.attribute = t1.attribute);
This formulation makes it easier to use indexes and statistics for optimizing the query. Also, select distinct is not appropriate with in (although I think Oracle will optimize away the distinct).
EDIT:
You appear to want to aggregate by attribute as well:
select t1.attribute, count(*)
from table1 t1
where timestamp >= trunc(sysdate-1/24, 'HH') and
timestamp < trunc(sysdate, 'HH') and
exists (select 1 from table2 t2 where t2.attribute = t1.attribute)
group by t1.attribute;

You can do it with a join and GROUP BY:
SELECT
count(*) AS Cnt
, a.attribute
FROM table1 t
JOIN table2 a ON t.attribute=a.attribute
WHERE to_char(t.timestamp,'yyyymmddhh24') = to_char(sysdate-1/24,'yyyymmddhh24')
GROUP BY a.attribute
This produces a row for each distinct attribute from table2, paired up with the corresponding count from table1.

Related

Join complementary tables in Postgesql

Let's say I have two tables table1and table2, with three columns each, id, time, value. They store the same kind of information, ie. a 30-minutes timeseries data for several ids (let's imagine a machine that produces an amount of energy per day). table2 contains more precise information than table1, but not for all timestamp nor all ids.
I want to get the best out of the two tables over the period defined by table2, ie. storing data from table2 when available, and discarding to table1 when required (to add some more complexity, let's say that table1 is not a real table, but rather a view that takes a hell lot of time to be fully computed, so that I want to avoid compute it in its integrality).
I thought I could define a perimeter of id-time to indicate which value should be kept each day (teh daily scale should be equivalent to the 30-minutes timestamp, and be less resource-consuming). Thus I went for :
with perimeter_per_day_table2 as (
select distinct
id,
date_trunc('day', time) as day
from table2
),
perimeter_per_day_table1 as (
select id,
date_trunc('day', time) as day,
from table1
where day >= (select min(time) from table2)
and day <= (select max(time) from table2)
and (id, day) not in (select id, day from perimeter_per_day_table2)
)
select * from perimeter_per_day_table1
but that takes a hell lot of time. In particular, it seems like the condition where (id, day) not in (select id, day from perimeter_per_day_table2) is very hard for Potsgresql to handle.
Any suggestion ?
Indeed, NOT IN isn't optimized as well as NOT EXIST in Postgres. So an equivalent not exists () condition is typically faster.
However, in neither case do you need to apply a (costly) DISTINCT on the rows in the sub-query.
with perimeter_per_day_table1 as (
select t1.id,
date_trunc('day', t1.time) as day
from table1 t1
where t1.day >= (select min(time) from table2)
and t1.day <= (select max(time) from table2)
and not exists (select *
from table2 t2
where t1.id = t2.id
and t1.day = t2.day)
)
select *
from perimeter_per_day_table1;
You can even avoid querying table2 twice for the min/max, but I doubt that will make a huge difference if there is an index on the time column:
with min_max as (
select min(time) as min_time,
max(time) as max_time
from table2
), perimeter_per_day_table1 as (
select t1.id,
date_trunc('day', t1.time) as day
from table1 t1
cross join min_max
where t1.day >= min_max.min_time
and t1.day <= min_max.max_time
and not exists (select *
from table2 t2
where t1.id = t2.id
and t1.day = t2.day)
)
select *
from perimeter_per_day_table1;

math operations between queries Impala SQL

i need to divide the results coming from two different queries in Impala through the HUE editor.
The query i wrote in Oracle is shown below:
select
(select count(distinct t1.ids)
from table1 t1
where extract(year from t1.insertdate)=2020)
/
(select count(distinct t2.ids)
from table2 t2
where extract(year from t2.insertdate)=2019)
from dual
On impala the same query does not work due to "/" operator. Can you please explain me how to do the same thing in Impala SQL?
You can join them on a dummy column and then divide the result sets.
SELECT cnt1.cnt1/cnt2.cnt2
FROM
(SELECT count(DISTINCT t1.ids) cnt1, 'dummy' dum
FROM table1 t1
WHERE YEAR (t1.insertdate)=2020) cnt1
JOIN
(SELECT count(DISTINCT t2.ids) cnt2, 'dummy' dum
FROM table2 t2
WHERE YEAR (t2.insertdate)=2019) cnt2
ON cnt1.dum= cnt2.dum -- dummy column

SQL/Impala: combined multiple query (with different where clause) into one

I have the following query:
'select team, count(distinct id) as distinct_id_count_w1 from myTable where timestamp > t1 and timestamp < t2 group by team'
'select team, count(distinct id) as distinct_id_count_w2 from myTable where timestamp > t2 and timestamp < t3 group by team'
Is it possible to combine these two queries into one? Thanks!
Easily :) This should work on most common DB engines:
select team, count(distinct id) as distinct_id_count_w1, null as distinct_id_count_w2 from myTable where timestamp > t1 and timestamp < t2 group by team
UNION ALL
select team, null as distinct_id_count_w1, count(distinct id) as distinct_id_count_w2 from myTable where timestamp > t2 and timestamp < t3 group by team
As Edamame stated, you may want to read both results per team. That was not clear from the question itself, but may be solved this way:
SELECT
COALESCE(interval1.team interval2.team) AS team,
interval1.distinct_id_count_w1,
interval2.distinct_id_count_w2
FROM (
select team, count(distinct id) as distinct_id_count_w1 from myTable where timestamp > t1 and timestamp < t2 group by team
) AS interval1
FULL OUTER JOIN
(
select team, count(distinct id) as distinct_id_count_w2 from myTable where timestamp > t2 and timestamp < t3 group by team
) AS interval2
ON interval1.team IS NULL OR interval2.team IS NULL OR interval1.team = interval2.team
if u think that the returned results are different, u should use "UNION ALL" because u only work with "UNION", sql will distinct the result to effect the performance of query

PostgreSQL Selecting Most Recent Entry for a Given ID

Table Essentially looks like:
Serial-ID, ID, Date, Data, Data, Data, etc.
There can be Multiple Rows for the Same ID. I'd like to create a view of this table to be used in Reports that only shows the most recent entry for each ID. It should show all of the columns.
Can someone help me with the SQL select? thanks.
There's about 5 different ways to do this, but here's one:
SELECT *
FROM yourTable AS T1
WHERE NOT EXISTS(
SELECT *
FROM yourTable AS T2
WHERE T2.ID = T1.ID AND T2.Date > T1.Date
)
And here's another:
SELECT T1.*
FROM yourTable AS T1
LEFT JOIN yourTable AS T2 ON
(
T2.ID = T1.ID
AND T2.Date > T1.Date
)
WHERE T2.ID IS NULL
One more:
WITH T AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Date DESC) AS rn
FROM yourTable
)
SELECT * FROM T WHERE rn = 1
Ok, i'm getting carried away, here's the last one I'll post(for now):
WITH T AS (
SELECT ID, MAX(Date) AS latest_date
FROM yourTable
GROUP BY ID
)
SELECT yourTable.*
FROM yourTable
JOIN T ON T.ID = yourTable.ID AND T.latest_date = yourTable.Date
I would use DISTINCT ON
CREATE VIEW your_view AS
SELECT DISTINCT ON (id) *
FROM your_table a
ORDER BY id, date DESC;
This works because distinct on suppresses rows with duplicates of the expression in parentheses. DESC in order by means the one that normally sorts last will be first, and therefor be the one that shows in the result.
https://www.postgresql.org/docs/10/static/sql-select.html#SQL-DISTINCT
This seems like a good use for correlated subqueries:
CREATE VIEW your_view AS
SELECT *
FROM your_table a
WHERE date = (
SELECT MAX(date)
FROM your_table b
WHERE b.id = a.id
)
Your date column would need to uniquely identify each row (like a TIMESTAMP type).

Redundancy in doing sum()

table1 -> id, time_stamp, value
This table consists of 10 id's. Each id would be having a value for each hour in a day.
So for 1 day, there would be 240 records in this table.
table2 -> id
Table2 consists of a dynamically changing subset of id's present in table1.
At a particular instance, the intention is to get sum(value) from table1, considering id's only in table2,
grouping by each hour in that day, giving the summarized values a rank and repeating this each day.
the query is at this stage:
select time_stamp, sum(value),
rank() over (partition by trunc(time_stamp) order by sum(value) desc) rn
from table1
where exists (select t2.id from table2 t2 where id=t2.id)
and
time_stamp >= to_date('05/04/2010 00','dd/mm/yyyy hh24') and
time_stamp <= to_date('25/04/2010 23','dd/mm/yyyy hh24')
group by time_stamp
order by time_stamp asc
If the query is correct, can this be made more efficient, considering that, table1 will actually consist of thousand's of id's instead of 10 ?
EDIT: I am using sum(value) 2 times in the query, which I am not able to get a workaround such that the sum() is done only once. Pls help on this
from table1
where exists (select t2.id from table2 t2 where value=t2.value)
The table2 doesn't have Value field. Why is the above query with t2.Value?
You could use a join here
from table1 t1 join table2 t2 on t1.id = t2.id
EDIT: Its been a while that I worked on Oracle. Pardon me, if my comment on t2.Value doesn't make sense.