generate sequence in impala sql - sql

I'd like to generate in sql some rows of fake data with a sequence of integer (given a max number).
The result should be something like that
1
2
3
4
5
...
10
Thank you very much

There is no easier way to generate series but its not too difficult either.
You need to use a large table with lots of rows for this.
SELECT seq FROM (
SELECT
row_number() OVER (PARTITION BY 'test' ORDER BY 'test') as seq
FROM
large_table
)rs
WHERE seq <= 100
here
row_number() - is used to generate row number for each row in large table.
where - is used to limit rows to 100. You can change this to any number.

Related

Oracle select specific rows

Can we select specific rows to range in oracle? for example, I have a table of 100 rows I have to select only a range of 10 to 20-row numbers. Is it possible to do that
You can do with an auxiliary operation. Firstly number the rows by row_number() function and then order by them :
select * from
(
select row_number() over (order by 0) rn, t.*
from tab t
)
where rn between 10 and 20;
but this is not a stable operation, since SQL statements are unordered sets. Therefore it's better to define a unique identity column and order depending on it.
Replace zero in the order by clause with some columns of your table to be able to reach a rigid ordering criteria. If a primary key column exists, it might be better to include only it in the order by list.
would LIMIT and OFFSET work?
ie.
SELECT * FROM table
LIMIT 20
OFFSET 20
will read rows 20 -> 40. Is this what you are trying to do?

Fetch No oF Rows that can be returned by select query

I'm trying to fetch data and showing in a table with pagination. so I use limit and offset for that but I also need to show no of rows that can be fetched from that query. Is there any way to get that.
I tried
resultset.last() and getRow()
select count(*) from(query) myNewTable;
These two cases i'm getting correct answer but is it correct way to do this. Performance is a concern
We can get the limited records using below code,
First, we need to set how many records we want like below,
var limit = 10;
After that sent this limit to the below statement
WITH
Temp AS(
SELECT
ROW_NUMBER() OVER( primayKey DESC ) AS RowNumber,
*
FROM
myNewTable
),
Temp2 AS(
SELECT COUNT(*) AS TotalCount FROM Temp
)
SELECT TOP limit * FROM Temp, Temp2 WHERE RowNumber > :offset order by RowNumber
This is run in both MSSQL and MySQL
There is no easy way of doing this.
1. As you found out, it usually boils down to executing 2 queries:
Executing SELECT with limit and offset in order to fetch the data that you need.
Executing a COUNT(*) in order to count the total number of pages.
This approach might work for tables that don't have a lot of rows, or when you filter the data (int the COUNT and SELECT queries) on a column that is indexed.
2. If your table is large, but the data that you need to show represents smaller percentage of the data from the table and the data shares a common trait (for example, the data in all of your pages is created on a single day) you can use partitioning. Executing COUNT and SELECT on a single partition will be way more faster than executing them on the whole table.
3. You can create another table which will store the value of the COUNT query.
For example, lets say that your big_table table looks like this:
id | user_id | timestamp_column | text_column | another_text_column
Now, your SELECT query looks like this:
SELECT * FROM big_table WHERE user_id = 4 ORDER BY timestamp_column LIMIT 20 OFFSET 20;
And your count query:
SELECT COUNT(*) FROM table WHERE user_id = 4;
You could create a count_table that will have the following format:
user_id | count
Once you fill this table with the current data in the system, you will create a trigger which will update this table on every insert or update of the big_table.
This way, the count query will be really fast, because it will be executed on the count_table, for example:
SELECT count FROM count_table WHERE user_id = 4
The drawback of this approach is that the insert in the big_table will be slower, since the trigger will fire and update the count_table on every insert.
This are the approaches that you can try but in the end it all depends on the size and type of your data.

Joining a series in postgres with a select query

I'm looking for a way to join these two queries (or run these two together):
SELECT s
FROM generate_series(1, 50) s;
With this query:
SELECT id FROM foo ORDER BY RANDOM() LIMIT 50;
In a way where I get 50 rows like this:
series, ids_from_foo
1, 53
2, 34
3, 23
I've been at it for a couple days now and I can't figure it out. Any help would be great.
Use row_number()
select row_number() over() as rn, a
from (
select a
from foo
order by random()
limit 50
) s
order by rn;
Picking the top n rows from a randomly sorted table is a simple, but slow way to pick 50 rows randomly. All rows have to be sorted that way.
Doesn't matter much for small to medium tables and one-time, ad-hoc use. For repeated use on a big table, there are much more efficient ways.
If the ratio of gaps / island in the primary key is low, use this:
SELECT row_number() OVER() AS rn, *
FROM (
SELECT *
FROM (
SELECT trunc(random() * 999999)::int AS foo_id
FROM generate_series(1, 55) g
GROUP BY 1 -- fold duplicates
) sub1
JOIN foo USING (foo_id)
LIMIT 50
) sub2;
With an index on foo_id, this blazingly fast, no matter how big the table. (A primary key serves just fine.) Compare performance with EXPLAIN ANALYZE.
How?
999999 is an estimated row count of the table, rounded up. You can get it cheaply from:
SELECT reltuples FROM pg_class WHERE oid = 'foo'::regclass;
Round up to easily include possible new entries since the last ANALYZE. You can also use the expression itself in a generic query dynamically, it's cheap. Details:
Fast way to discover the row count of a table in PostgreSQL
55 is your desired number of rows (50) in the result, multiplied by a low factor to easily make up for the gap ratio in your table and (unlikely but possible) duplicate random numbers.
If your primary key does not start near 1 (does not have to be 1 exactly, gaps are covered), add the minimum pk value to the calculation:
min_pkey + trunc(random() * 999999)::int
Detailed explanation here:
Best way to select random rows PostgreSQL

Performance issue when using ORDER BY dbms_random.value for Oracle database

I need to get 1000 random rows from a table and found solution for Oracle. But if I use this query when retrieving data from table containing large amount of rows it takes up to 3 minutes to complete:
SELECT column FROM
( SELECT column FROM table
ORDER BY dbms_random.value )
WHERE rownum <= 1000
It happens because all rows are selected and then all of them are ordered by random value when I need only 1000. Is there any workaround for such problem? Maybe using dbms_random.value along with some cursor that will pick random row.
I would do that in this manner:
SELECT column
FROM table sample (1)
where rownum <= 1000
--ORDER BY dbms_random.value
;
Will get a sample of 1 percent from table, stop at first 1000 (and, if needed, order randomly) .
It is possible to exist a better way to do want you want. This is what I'll try.

Optimizing a simple SQLite query, if possible !

I would like to optimize this query using SQLite 3.
SELECT id FROM Table WHERE value = (SELECT max(value) FROM Table WHERE value < myvalue )
UNION
SELECT id FROM Table WHERE value = (SELECT min(value) FROM Table WHERE value > myvalue );
I want the 2 closest id from a given value. Example: id 20, value 50. The closest id could be 3 with the value 48 (max value inferior) and above id 4 with value 55 (min value superior).
SQLite 3 has not all the features of a real database, if you have something better I can use, well thanks !
SELECT
(SELECT id FROM test WHERE value < myvalue ORDER BY value DESC LIMIT 1) as below,
(SELECT id FROM test WHERE value > myvalue ORDER BY value ASC LIMIT 1) as above;
Theorically speaking this should be faster becase it use two table scans intead of four.
Anyway i would create a table with a few millon records and test different queries with
the timer on. (.timer ON in sqlite console).
Also make sure to test with and without an index on value. Sometimes, specially
when the index size if bigger than your memory, indexes are useless.
If speed is the real issue consider an alternative light storage, like Kyoto
Cabinet.
Here's another way to do it. I don't know if it's faster in sqlite though. You can always try.
select id
from table
where value - myvalue > 0
order by abs(value - myvalue) asc
limit 1
union all
select id
from table
where value - myvalue < 0
order by abs(value - myvalue) desc
limit 1
SELECT id FROM Table WHERE value > myvalue ORDER BY value LIMIT 1
SELECT id FROM Table WHERE value < myvalue ORDER BY value DESC LIMIT 1
this solution has no sub-selects, table scans and no extraneous group or math functions.
but needs two queries
you should index Table.value