My database table has 15 records and I want to show 9 at random on screen.
SELECT * FROM tablename ORDER BY RAND() LIMIT 9
This works as expected, but what if the table only has 9 records? I need to pull 15 random records.
I understand this will duplicate one or more records, but that's my intention.
Your select will only pull the number of records in the table regardless of the order by. You can use various means to duplicate the table data, however, before you order them. For example, union all of the rows together twice:
select * from
(
select * from tablename
union all
select * from tablename
) as tmp
order by rand() limit 9
RAND() itself is not efficient when you deal with a large database.
A better way of doing such query is to:
-1. Query the largest id (assume id is the unique key)
-2. use javascript of php function to generate 15 random numbers from 1 to max_id, push to -array
-3. Implode the array (e.g. $id_list = "'".implode("', '", $id_list)."'")
-4. Select * from tablename where id in ($id_list)
This will work even if you have only 1 row in the table. If you have less than 15 (say 11) rows, you'll have all 11 in the result plus 4 more random ones:
SELECT col1, col2, ..., colN -- the columns of `tablename` only
FROM
( SELECT a.i, b.j, t.*
FROM
( SELECT *, RAND() AS rnd
FROM tablename
ORDER BY rnd LIMIT 15
) AS t
CROSS JOIN
( SELECT 1 AS i UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 )
AS a
CROSS JOIN
( SELECT 1 AS j UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 )
AS b
ORDER BY i, j, rnd
LIMIT 15
) AS t15
ORDER BY RAND() ;
If you want "more" randomness, having duplicate or triplicate rows in the results with possibly some rows not shown at all, replace the last five lines with:
AS b
ORDER BY RAND()
LIMIT 15
) AS t15 ;
Related
In SQL, I use the following code to remove duplicates from a table based on a unique ID:
1. SELECT Unique_ID INTO holdkey FROM [Origination] GROUP BY Unique_ID HAVING count(*) > 1
2. SELECT DISTINCT Origination.*
INTO holddups
FROM [Origination], holdkey
WHERE [Origination].Unique_ID = holdkey.Unique_ID
3. DELETE Origination
FROM Origination, holdkey
WHERE Origination.Unique_ID = holdkey.Unique_ID
4. INSERT Origination SELECT * FROM holddups
The second process does not work on BigQuery. Regardless of how I change the query, I get errors for unrecognized columns and tables.
I obviously take out "select into" queries and just set the destination tables manually. I have SQL experience, and I know the process works. Does anyone have a sample of syntax that they use for the process of removing duplicate records based on a unique ID for BQ? Or a way to modify this that would make it run?
So, the trick is in having proper SELECT here
Below example is for BigQuery Standard SQL
#standardSQL
SELECT row[OFFSET(0)].* FROM (
SELECT ARRAY_AGG(t ORDER BY value DESC LIMIT 1) row
FROM `project.dataset.table_with_dups` t
GROUP BY id
)
you can test / play with above using dummy data as below
#standardSQL
WITH `project.dataset.table_with_dups` AS (
SELECT 1 id, 2 value UNION ALL SELECT 1,3 UNION ALL SELECT 1,4 UNION ALL
SELECT 2,5 UNION ALL
SELECT 3,6 UNION ALL SELECT 3,7 UNION ALL
SELECT 4,8 UNION ALL
SELECT 5,9 UNION ALL SELECT 5,10
)
SELECT row[OFFSET(0)].* FROM (
SELECT ARRAY_AGG(t ORDER BY value DESC LIMIT 1) row
FROM `project.dataset.table_with_dups` t
GROUP BY id
)
with result as
Row id value
1 1 4
2 2 5
3 3 7
4 4 8
5 5 10
As you can see it easily dedups table by id leaving row with largest value. Does not matter how many more other columns in that table - above still works (it does not care of schema rather than id and value)
So, now, you can just use above SELECT and insert result into new table or overwrite original, etc. - all in one shot!
I need a query with a column with row number (probably using ROW_NUMBER() ) and if the result are 8 rows (e.g.) the query should result 10 rows with rows 9 and 10 blank except row number. If the result is 15 rows the result should be 20 rows, and so on...
It is possivel?
Normally, something like this would be done in the application layer. However, you can do this in SQL:
select t.*
from table t
union all
select nulls.*
from (select 1 as n union all select 2 union all . . . select 10
) n cross join
(select count(*) cnt from table) cnt left join
table nulls
on 1 = 0
where 10 * floor(cnt / 10) + n.n <= cnt;
The first subquery gets all your data. The second gets the additional rows with NULL values. It uses a left join with "false" condition to get all the columns.
There seems to be a few blog posts on this topic but the solutions really are not so intuitive. Surely there's a "Canonical" way?
I'm using Teradata SQL.
How would I select
A range of number
A date range
E.g.
SELECT 1:10 AS Nums
SELECT 1-1-2010:5-1-2014 AS Dates1
The result would be 10 rows (1 - 10) in the first SELECT query and ~(365 * 3.5) rows in the second?
The "canonical" way to do this in SQL is using recursive CTEs, which the more recent versions of Teradata support.
For your first example:
with recursive nums(n) as (
select 1 as n
union all
select n + 1
from nums
where n < 10
)
select *
from nums;
You can do something similar for dates.
EDIT:
You can also do this by using row_number() and an existing table:
with nums(n) as (
select n
from (select row_number() over (order by col) as n
from ExstingTable t
) t
where n <= 10
)
select *
from nums;
ExistingTable is just any table with enough rows. The best choice of col is the primary key.
with digits(n) as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9 union all select 10
)
select *
from digits;
If your version of Teradata supports multiple CTEs, you can build on the above:
with digits(n) as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9 union all select 10
),
nums(n) as (
select d1.n*100 + d2.n*10 + d3.n
from digits d1 cross join digits d2 cross join digits d3
)
select *
from nums;
In Teradata you can use the existing sys_calendar to get those dates:
SELECT calendar_date
FROM sys_calendar.CALENDAR
WHERE calendar_date BETWEEN DATE '2010-01-01' AND DATE '2014-05-01';
Note:
DATE '2010-01-01' is the only recommended way to write a date in Teradata
There's probably another custom calendar for the specific business needs of your company, too. Everyone will have access rights to it.
You might also use this for the range of numbers:
SELECT day_of_calendar
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 10;
But you should check Explain to see if the estimated number of rows is correct. sys_calendar is a kind of template and day_of_calendar is a calculated column, so no statistics exists on that and Explain will return an estimated number of 14683 (20 percent of the number of rows in that table) instead of 10. If you use it in additional joins the optimizer might do a bad plan based on that totally wrong number.
Note:
If you use sys_calendar you are limited to a maximum of 73414 rows, dates between 1900-01-01 and 2100-12-31 and numbers between 1 and 73414, your business calendar might vary.
Gordon Linoff's recursive query is not really efficient in Teradata, as it's a sequential row-by-row processing in a parallel database (each loop is an "all-AMPs step" in Explain) and the optimizer doesn't know how many rows will be returned.
If you need those ranges regularly you might consider creating a numbers table, I usually got one with a million rows or I use my calendar with the full range of 10000 years :-)
--DROP TABLE nums;
CREATE TABLE nums(n INT NOT NULL PRIMARY KEY CHECK (n BETWEEN 0 AND 999999));
INSERT INTO Nums
WITH cte(n) AS
(
SELECT day_of_calendar - 1
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 1000
)
SELECT
t1.n +
t2.n * 1000
FROM cte t1 CROSS JOIN cte t2;
COLLECT STATISTICS COLUMN(n) ON Nums;
The COLLECT STATS is the most important step to get correct estimates.
Now it's a simple
SELECT n FROM nums WHERE n BETWEEN 1 AND 10;
There's also a nice UDF on GitHub for creating sequences which is easy to use:
SELECT DATE '2010-01-01' + SEQUENCE
FROM TABLE(gen_sequence(0,DATE '2014-05-01' - DATE '2010-01-01')) AS t;
SELECT SEQUENCE
FROM TABLE(gen_sequence(1,10)) AS t;
But it's usually hard to convince your DBA to install any C-UDFs and the number of rows returned is unknown again.
sequence 1 to 10
sel sum (1) over (ROWS UNBOUNDED PRECEDING) as seq_val
from sys_calendar.CALENDAR
qualify row_number () over (order by 1)<=10
I want Union of two queries with the first query resultset getting the first ten rownums in Oracle.
Example:
Like if first query has 10 rows and max rownum is 10.I want second query rownum to be started from 11 in the result of union.
SELECT *
FROM (
SELECT *
FROM table1
ORDER BY
col1
)
WHERE rownum <= 10
UNION ALL
SELECT *
FROM (
SELECT *, rownum AS rn
FROM (
SELECT *
FROM table2
ORDER BY
col2
)
)
WHERE rn > 10
First of all, I'm not familiar with SQL in depth, so this may be a beginner question.
I know how to select data ordered by Id: SELECT * FROM foo ORDER BY id LIMIT 100 as well as how to select a random subset: SELECT * FROM foo ORDER BY RAND() LIMIT 100.
I'd like to merge these two queries into 1 in a zip manner, choosing limit/2 from each (i.e. 50). For example:
0
85
1
35
2
38
3
19
4
...
I would like to avoid duplicates. The easiest way is probably to just add a WHERE id > 100/2 to the part of the query that retrieves randomly ordered rows.
Additional info: It is unknown how many rows exist.
To get the "zip-manner" merge add a generated rownumber to each query and use an union with order by rownnumber.
Use even numbers for one and odd numbers for the other query.
Try this for MySQL
SELECT
#rownum0:=#rownum0+2 rn,
f.*
FROM ( SELECT * FROM foo ORDER BY id ) f, (SELECT #rownum0:=0) r
UNION
SELECT #rownum1:=#rownum1+2 rn,
b.*
FROM ( SELECT * FROM bar ORDER BY RAND() ) b, (SELECT #rownum1:=-1) r
ORDER BY rn
LIMIT 100
This should be self-explicative but doesnt remove duplicates:
select #rownum:=#rownum+1 as rownum,
(#rownum-1) % 50 as sortc, u.id
from (
(select id from player order by id limit 50)
union all
(select id from player order by rand() limit 50)) u,
(select #rownum:=0) r
order by sortc,rownum;
If you replace "union all" with "union", you remove duplicates but get less rows as a consequence.
This will deal with duplicates, does not restrict random numbers in the ids > 50, and always return 100 rows:
SELECT #rownum := #rownum + 1 AS rownum,
( #rownum - 1 ) % 50 AS sortc,
u.id
FROM ((SELECT id
FROM foo
ORDER BY Rand()
LIMIT 50)
UNION
(SELECT id
FROM foo
WHERE id <= 100
ORDER BY id)) u,
(SELECT #rownum := 0) r
WHERE #rownum < 100
ORDER BY sortc,
rownum DESC
LIMIT 100;
SELECT * FROM foo ORDER BY id LIMIT 50 UNION SELECT * FROM foo ORDER BY RAND() LIMIT 50
If I understand your requirement correctly. UNION removes duplicates by itself