OVER() vs Two Queries - Which is Most Efficient - sql

I need to pull back the first 300 rows from a 10MM row table, as well as getting a count of the total number of matching records.
I can do this in two queries, something like:
SELECT * FROM table WHERE field = value LIMIT 300;
SELECT count(*) FROM table WHERE field = value;
Or I could use an OVER():
SELECT *, COUNT(*) OVER() AS total FROM table WHERE field = value LIMIT 300;
Which would be the most efficient? I don't care about the need to run two queries, I'm after the most efficient solution. I'm no expert, and I've tried to run an "explain" but it doesn't make much sense to me. This is running on Amazon Redshift.

if your SortKey is timestamp field, the most efficient to run will be
select *
from(
select * , count(*) over() as total,
row_number () over(order by timestamp) as rank
from table
where filed =value)
where rank<301

Related

How to SELECT TOP 95% of the row in a table

I want to create a performance report based on table data.
I dont know how many rows are there in the table, I would like to have Top 95% (Percent) of the rows based on some where condition.
Table Structure -
Column Name - txid , start_time, end_time
For my Performance report I need to get the average of end_time - start_time. The common value of (end_time - start_time) ranges from 100ms to 1 sec.
However there are few transaction (less than 2% ) that took around 100-2K sec due to some or the other technical error.
I want to avoid those rows to get a fair average report. Including those rows in my Report raises a huge concern.
You can use a subquery. I would just go for row_number() and count(*), although other window functions such as ntile(), percentile_cont(), and percentile_disc() could be used for this purpose:
select t.*
from (select t.*,
row_number() over (order by <ordering col>) as seqnum,
count(*) over () as cnt
from t
where . . .
) t
where seqnum <= 0.95 * cnt;
Supposing you have a table TABLE with a field id:
select top (
(select count(Id) FROM [TABLE])*95/100
) id FROM [TABLE]
In TSQL:
DECLARE #ourCount as Int
DECLARE #topNinetyFive as Int
Select #ourCount = count(1) FROM [ourDatabase].[dbo].[ourTable]
Set #topNinetyFive = round(0.95 * #ourCount, 0)
Select TOP (#topNinetyFive) * FROM [ourDatabase].[dbo].[ourTable]
-- NOTE: a more meaningful criteria could be based on one of the columns with a 'where' clause

Get a row number on select statement while matching entire row

I am trying to get a row number of the row. Since the table doesn't have any id column, I have used ROW_NUMBER() without any order which is shown below.
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS SNO, *
FROM [table1]
Now the challenge is i need to find a row with a condition which is just a select statement with where clause but with a original row number.
SELECT TOP 1 *
FROM table1
WHERE [Total Sales] = 2555
This statement returns a single record. I have tried to use INTERSECT to combine both statements to get result with row number.
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS SNO, *
FROM [table1]
INTERSECT
SELECT TOP 1 *
FROM table1
WHERE [Total Sales] = 2555
Of course, this throws errors since number of columns are different. So what is the correct way to get the actual row number ?
When you run this query:
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS SNO, t.*
FROM [table1] t;
The SNO values are unstable. That means that the same query run multiple times might return different numbers. Sorting in SQL is not stable. That means that identical keys can be in an arbitrary order when the query is run multiple times. Why? SQL tables and result sets represent unordered sets. There is nothing to base a stable sort on.
The simplistic answer to your question is to use a subquery:
SELECT t.*
FROM (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS SNO, t.*
FROM [table1] t
) t
WHERE [Total Sales] = 2555;
However, the real answer is that you should be using multiple columns to create a stable sort, if you want to use this value for more than one query.
SQL does not have an initial "row number" for the entries. The table order shown is all based on the query results. If you are looking to keep them in the order they are put into the DB then maybe add a time stamp that's generated with a trigger and attached to the row when it's inserted. Then using this times tamp you can have them sorted by that.
What's the primary key if there is no I'd?

How to sum two columns in sql without group by

I have columns such as pagecount, convertedpages and changedpages in a table along with many other columns.
pagecount is the sum of convertedpages and changedpages.
I need to select all rows along with pagecount and i cant group them. I am wondering if there is any way to do it?
This select is part of view. so can i use another sql statement to bring just the sum and then somehow make it part of the main sql query?
Thank you.
SELECT
*,
(ConvertedPages + ChangedPages) as PageCount
FROM Table
If I'm understanding your question correctly, while I'm not sure why you can't use group by, another option would be to use a correlated subquery:
select distinct id,
(select sum(field) from yourtable y2 where y.id = y2.id) summedresult
from yourtable y
This assumes you have data such as:
id | field
1 | 10
1 | 15
2 | 10
And would be equivalent to:
select id, sum(field)
from yourtable
group by id
Not 100% on what you're after here, but if you want a total across rows without grouping, you can use OVER() with an aggregate in SQL Server:
SELECT *, SUM(convertedpages) OVER() AS convertedpages
, SUM(changedpages) OVER() AS changedpages
, SUM(changedpages + convertedpages) OVER() as PageCount
FROM Table
This repeats the total for every row, you can use PARTITION BY inside OVER() if you'd like to have the aggregate to be grouped by some fields while still displaying the full detail of all rows.

Efficient query for the first result in groups (postgresql 9)

I have a table with 200000 rows and columns: name and date. The dates and names may have repeated values. I would like get the first 300 unique names for the dates sorted in an ascending order and have this run fast as my table may have a million rows.
I am using postgresql 9.
SELECT name, date
FROM
(
SELECT DISTINCT ON (name) name, date
FROM table
ORDER BY name, date
) AS id_date
ORDER BY date
LIMIT 300;
The last query of #jachguate will miss names having two dates on the same date, however this one doesn't.
The query takes about 100 ms in a non-optimized postgresql 9.1 with about 100.000 entries, thus it may not scale to millions of entries.
An upgrade to postgresql 9.2 may help, as according to the release notes there are many performance improvements
use a CTE:
with unique_date_name as (
select date, name, count(*) rcount
from table
group by date, name
having count(*) = 1
)
select name, date
from unique_date_name
order by date limit 300;
Edit
From the comments, this result in poor performance, so try this other:
select date, name, count(*) rcount
from table
group by date, name
having count(*) = 1
order by date limit 300;
or, transforming the original query into a nested subquery in FROM instead of a CTE:
select name, date
from (
select date, name, count(*) rcount
from table
group by date, name
having count(*) = 1
) unique_date_name
order by date limit 300;
unfortunately I don't have a postgreSQL at hand to check if it works, but the optimizer will make a better work.
A Index for (date, name) is a must for optimal performance.

SELECT *, COUNT(*) in SQLite

If i perform a standard query in SQLite:
SELECT * FROM my_table
I get all records in my table as expected. If i perform following query:
SELECT *, 1 FROM my_table
I get all records as expected with rightmost column holding '1' in all records. But if i perform the query:
SELECT *, COUNT(*) FROM my_table
I get only ONE row (with rightmost column is a correct count).
Why is such results? I'm not very good in SQL, maybe such behavior is expected? It seems very strange and unlogical to me :(.
SELECT *, COUNT(*) FROM my_table is not what you want, and it's not really valid SQL, you have to group by all the columns that's not an aggregate.
You'd want something like
SELECT somecolumn,someothercolumn, COUNT(*)
FROM my_table
GROUP BY somecolumn,someothercolumn
If you want to count the number of records in your table, simply run:
SELECT COUNT(*) FROM your_table;
count(*) is an aggregate function. Aggregate functions need to be grouped for a meaningful results. You can read: count columns group by
If what you want is the total number of records in the table appended to each row you can do something like
SELECT *
FROM my_table
CROSS JOIN (SELECT COUNT(*) AS COUNT_OF_RECS_IN_MY_TABLE
FROM MY_TABLE)