I have three queries and I want result rows consisting of entries of these queries randomly joined next to each other.
I dont't want to union the results, but to join them in a more or less random way (oroginal distribution may be kept, or can be unified across all).
I tried the following:
select *
from
(
SELECT street, number
FROM Addresses
WHERE valid = '1'
order by Dbms_Random.Value
) q1 ,
(
select prename
from person
order by Dbms_Random.Value
) q2 ,
(
select surname
from person
order by Dbms_Random.Value
) q3
My result set however looks not random at all:
Main street, 1, Andre, Smith
Main street, 1, Andre, Warnes
Main street, 1, Andre, Jackson
Main street, 1, Andre, Macallister
Removing the ORDER BY from the queries and applying it to the result of the cartesian product is extremely inefficient as the tables are large, and espacially their cartesian product.
Colin 't Hart diagnosed the problem, and suggested a work around using rownum. But the solution is slightly more complicated then that because the ROWNUM is assigned before the ORDER BY if they both appear in the same SELECT. The solution is to add one extra subquery level.
with randomAddress as(
select rownum id, street, num from (
select * from addresses where valid=1 order by dbms_random.random
)
),
randomPrename as(
select rownum id, prename from(
select * from person order by dbms_random.random
)
),
randomSurname as(
select rownum id, surname from(
select * from person order by dbms_random.random
)
)
select street, num, prename, surname
from randomAddress
join randomPrename using(id)
join randomSurname using(id)
;
This solution will always return a number of random rows that is equal to the number of rows in the smallest table. No row will be used more than once. Here is the SQL Fiddle.
The number of rows returned by the GWu solution will vary depending on how many rows are assigned the same random number. Some rows may be used multiple times, and other rows not at all. You should also have an idea of how many rows are in the tables to use that solution.
You could move Dbms_Random.Value to a column in your subquery and join by it.
This will randomize the result and also get rid of the order by:
select *
from
(
SELECT street, snumber, ROUND(Dbms_Random.Value(1,10)) n
FROM Addresses
WHERE valid = '1'
) q1 ,
(
select prename, ROUND(Dbms_Random.Value(1,10)) n
from person
) q2 ,
(
select surname, ROUND(Dbms_Random.Value(1,10)) n
from person
) q3
where q1.n = q2.n
and q2.n = q3.n
;
(see also http://www.sqlfiddle.com/#!4/a26d0/9)
The value 10 in ROUND(Dbms_Random.Value(1,10)) is just an assumption, change it to your number of expected or available records.
Note that this solution reuses results from each subquery, so for example prename might be used more than once or not at all, but that was also the case in your original cartesian join.
Colin's approach ensures uniqueness, if you need that.
The problem you're having is that while each table is being ordered randomly, you still have a cartesian product so that the tops rows will have the first 2 columns having the same values with only the last column varying.
If you select the pseudo column ROWNUM (you'll need to alias it as eg row_number), and then join the 3 tables on row_number, you should get a random combination of data from your 3 tables.
But you'll be limited to a total number of rows equal to the number of rows in the smallest table.
Related
Table1 has the following 2 columns and 4 rows:
Entity Number
------ ------
Car 4
Shop 1
Apple 3
Pear 1
I'd like to have one set based SQL query, which produces the below desired results. Basically duplicating the Entities by the Number of times in the Number column.
I could only do it by loop through the rows one by one, which is not really elegant, neither set based.
Desired result:
Entity
------
Car
Car
Car
Car
Shop
Apple
Apple
Apple
Pear
One method uses recursive CTEs:
with cte as (
select t1.entity, t1.number
from table1 t1
union all
select cte.entity, cte.number - 1
from cte
where cte.number > 0
)
select entity
from cte;
Note: Using the default settings, this is limited to 100 rows per entity. You can use OPTION (MAXRECURSION 0) to get around this.
You can also solve this with a numbers table, but such a problem is a good introduction to recursive CTEs.
Use this
;WITH CTE
AS
(
SELECT
SeqNo = 1,
Entity,
Number
FROM YourTable
UNION ALL
SELECT
SeqNo = SeqNo+1,
Entity,
Number
FROM CTE
WHERE SeqNo < Number
)
SELECT
Entity
FROM CTE
ORDER BY 1
A non-recursion solution, will be using a fixed sequence number, then join the table based on this number like this:
WITH numbers
AS
(
SELECT n
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9), (10)) AS numbers(n)
)
SELECT t.Entity
FROM Table1 AS t
INNER JOIN numbers as n ON t.number >= n.n;
This will support up to 10 times duplication, you can add extra numbers to support extra duplication times.
Demo
You can use spt_values as source for numbers table
select EntityList.*
from EntityList
, (
select number as n from master..spt_values WHERE Type = 'P' and Number between 1 and (select max(number) from EntityList)
) t
where n <= number
order by entity
For a simple SQL like,
SELECT top 3 MyId FROM MyTable ORDER BY NEWID()
how to add row numbers to them so that the row numbers become 1,2, and 3?
UPDATE:
I thought I can simplify my question as above, but it turns out to be more complicated. So here is a fuller version -- I need to give three random picks (from MyTable) for each person, with pick/row number of 1, 2, and 3, and there is no logical joining between person and picks.
SELECT * FROM Person
LEFT JOIN (
SELECT top 3 MyId FROM MyTable ORDER BY NEWID()
) D ON 1=1
The problem with above SQL are,
Obviously, pick/row number of 1, 2, and 3 should be added
and what is not obvious is that, the above SQL will give each person the same picks, whereas I need to give different person different picks
Here is a working SQL to test it out:
SELECT TOP 15 database_id, create_date, cs.name FROM sys.databases
CROSS apply (
SELECT top 3 Row_number()OVER(ORDER BY (SELECT NULL)) AS RowNo,*
FROM (SELECT top 3 name from sys.all_views ORDER BY NEWID()) T
) cs
So, Please help.
NOTE: This is NOT about MySQL byt T-SQL as their syntax are different, Thus the solution is different as well.
Add Row_number to outer query. Try this
SELECT Row_number()OVER(ORDER BY (SELECT NULL)),*
FROM (SELECT TOP 3 MyId
FROM MyTable
ORDER BY Newid()) a
Logically TOP keyword is processed after Select. After Row Number is generated random 3 records will be pulled. So you should not generate Row Number in original query
Update
It can be achieved through CROSS APPLY. Replace the column names inside cross apply where clause with valid column name from Person table
SELECT *
FROM Person p
CROSS apply (SELECT Row_number()OVER(ORDER BY (SELECT NULL)) rn,*
FROM (SELECT TOP 3 MyId
FROM MyTable
WHERE p.some_col = p.some_col -- Replace it with some column from person table
ORDER BY Newid())a) cs
I was asked this question during an interview for a Junior Oracle Developer position, the interviewer admitted it was a tough one:
Write a query/queries to check if the table 'employees_hist' is an exact copy of the table 'employees'. Any ideas how to go about this?
EDIT: Consider that tables can have duplicate records so a simple MINUS will not work in this case.
EXAMPLE
EMPLOYEES
NAME
--------
Jack Crack
Jack Crack
Jill Hill
These two would not be identical.
EMPLOYEES_HIST
NAME
--------
Jack Crack
Jill Hill
Jill Hill
If the tables have the same columns, you can use this; this will return no rows if the rows in both tables are identical:
(
select * from test_data_01
minus
select * from test_data_02
)
union
(
select * from test_data_02
minus
select * from test_data_01
);
Identical regarding what? Metadata or the actual table data too?
Anyway, use MINUS.
select * from table_1
MINUS
select * from table_2
So, if the two tables are really identical, i.e. the metadata and the actual data, it would return no rows. Else, it would prove that the data is different.
If, you receive an error, it would mean the metadata itself is different.
Update If the data is not same, and that one of the table has duplicates.
Just select the unique records from one of the table, and simply apply MINUS against the other table.
One possible solution, which caters for duplicates, is to create a subquery which does a UNION on the two tables, and includes the number of duplicates contained within each table by grouping on all the columns. The outer query can then group on all the columns, including the row count column. If the table match, there should be no rows returned:
create table employees (name varchar2(100));
create table employees_hist (name varchar2(100));
insert into employees values ('Jack Crack');
insert into employees values ('Jack Crack');
insert into employees values ('Jill Hill');
insert into employees_hist values ('Jack Crack');
insert into employees_hist values ('Jill Hill');
insert into employees_hist values ('Jill Hill');
with both_tables as
(select name, count(*) as row_count
from employees
group by name
union all
select name, count(*) as row_count
from employees_hist
group by name)
select name, row_count from both_tables
group by name, row_count having count(*) <> 2;
gives you:
Name Row_count
Jack Crack 1
Jack Crack 2
Jill Hill 1
Jill Hill 2
This tells you that both names appear once in one table and twice in the other, and therefore the tables don't match.
select name, count(*) n from EMPLOYEES group by name
minus
select name, count(*) n from EMPLOYEES_HIST group by name
union all (
select name, count(*) n from EMPLOYEES_HIST group by name
minus
select name, count(*) n from EMPLOYEES group by name)
You could merge the two tables and then subtract one of the tables from the result. If the result of the subtraction is an empty table then you know that the the tables must be the same since merge had no effect (every row and column were effectively the same)
How do I merge two tables with different column number while removing duplicates?
That link provides a good way to merge the two tables without duplicates without knowing what the columns are.
Ensure the rows are unique by adding a pseudo column
WITH t1 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees)
, t2 AS
(SELECT <All_Columns>
, row_number() OVER
(PARTITION BY <All_Columns>
ORDER BY <All_Columns>) row_num
FROM employees_hist)
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2
UNION ALL
(SELECT *
FROM t1
MINUS
SELECT *
FROM t2)
Use row_number to make sure there are no duplicate rows. Now you can use minus and if there are no results, the tables are identical.
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab1
MINUS
SELECT ROW_NUMBER() OVER (Order By Name), *
FROM tab2
Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.
I have a table with 2 columns:
name, percentage
I have a 100 rows in this table, and want to make a query that selects the 5 rows with the smallest percentage value and the 5 rows with the largest percentage value.
Normally I would do this with limit and offset but it will select only one of the outcome groups I seek. I wonder if there is a way of selecting both.
I have been looking for a solution for a while, and I thought about FETCH, but I don't really succeed in using right.
UNION ALL
(
SELECT name, percentage
FROM tbl
ORDER BY percentage
LIMIT 5
)
UNION ALL
(
SELECT name, percentage
FROM tbl
ORDER BY percentage DESC
LIMIT 5
);
You need parenthesis, to apply ORDER BY and LIMIT to nested SELECT statements of a UNION query. I quote the manual here:
ORDER BY and LIMIT can be attached to a subexpression if it is
enclosed in parentheses. Without parentheses, these clauses will be
taken to apply to the result of the UNION, not to its right-hand input expression.
UNION (without ALL) would remove duplicates in the result. A useless effort if you don't expect dupes.
Subquery with row_number()
SELECT name, percentage
FROM (
SELECT *
, row_number() OVER (ORDER BY percentage) AS rn_min
, row_number() OVER (ORDER BY percentage DESC) AS rn_max
FROM tbl
) x
WHERE rn_min < 6
OR rn_max < 6;
This collapses duplicates like UNION would. Performance will be similar, probably a bit slower than the first one.
Either way, order by additional columns to break ties in a controlled manner. As is, you get arbitrary rows from groups of peers sharing the same percentage.