Selecting random value for every row - sql

Suppose I have 2 tables called 'FOR_TEST_1' with column A, B, C and 'FOR_TEST_2' with column D, E, F.
I would like to generate column A paired with a random value from column D.
Here is a snippet of the tables.
So far, this is the statement that I have been doing and it return the same value of D for every row in A.
Currently I am using toad for oracle, but I tried using the same logic in MySQL and it works fine.

You're expecting Oracle to execute the subquery once per row (which is what MySQL does). However, it seems you have run into a side-effect of an Oracle optimization. There's no correlation between the main query and the scalar subquery so Oracle decides to un-nest the subquery, execute it once and join the result to the main query.
To get the results you want you have a couple of options. One is to turn off the unnesting with the NO_UNNEST hint.
select t1.a
, ( select d from ( select /*+ NO_UNNEST */ d from for_test_two
order by dbms_random.value ) where rownum = 1) d
from for_test_one t1
/
Alternatively you could rewrite you query to use an inline view rather than a scalar subquery.
select t1.a
, t2.d
from ( select a, rownum as rn
from for_test_one) t1
join ( select d, rownum as rn
from ( select d from for_test_two
order by dbms_random.value() ) ) t2
on t1.rn = t2.rn
order by t1.rn
/
Warning: The NO_UNNEST solution doesn't work on SQL Fiddle demo (find it here). Not sure why, the syntax looks correct. So try it on your environment, or just use the second approach, which definitely works.

Try this:
SELECT A, (SELECT D (
SELECT D, ROWNUM ROWPTR FROM FOR_TEST_2)
WHERE ROWPTR = (SELECT ROUND(DBMS_RANDOM.VALUE(1, (SELECT COUNT(D) FROM FOR_TEST_2) + 1 )) from DUAL)) D
FROM FOR_TEST_1

Related

sql: first row after the last row with a property

I would like to write a query that returns the first row immediately after the last row with a given property (ordered by id). Id's may not be consecutive.
Ideally it would look something like this:
...
JOIN (select max(id) id from my_table where CONDITION) m
JOIN (select min(id) from my_table where id > m.id) n
However, I can not use identifier m in the second subselect.
It is possible to use nested queries in nested queries, but is there an easier way?
Thank you.
You could use lead() to get the next id before applying the condition:
select t.*
from my_table t join
(select max(next_id) as max_next_id
from (select t.*, lead(id) over (order by id) as next_id
from my_table t
) t
where <condition>
) tt
on t.id = tt.max_next_id;
You could also do:
select t.*
from my_table t
where t.id > (select max(t2.id) from my_table t2 where <condition>)
order by t2.id asc
fetch first 1 row only;
I am not sure how this is getting woven into the rest of your query, so I have used a CTE
WITH max_next AS (
SELECT r.id as max_id
,r.next_id
FROM (
SELECT m.id
,m.next_id
,ROW_NUMBER() OVER (ORDER BY m.id DESC) AS rn
FROM (
SELECT n.* -- to provide data to satisfy CONDITIONS
,LEAD(n.id) OVER(ORDER BY n.id) as next_id
FROM my_table AS n
) AS m
WHERE CONDITIONS
) AS r
WHERE r.rn = 1
)
I would also shrink the n.* to the columns needed by CONDITIONS to a, not be implicit as the * slows the compile time down (or historically has) as all meta data needs to be read to understand what columns is in the ANY, and the while the compile can also prune not used columns, it's faster if you just ask for what you want (in best case just a compile time savings, worse case, it read all the data when you only need x number of columns read)
And borrowing from Gordon solution, the ROW_NUMBER part could be simpler
WITH max_next AS (
SELECT m.id
,m.next_id
--, plus what ever other things you want from m
FROM (
SELECT n.* -- to satisfy CONDITIONS needs
,LEAD(n.id) OVER(ORDER BY n.id) as next_id
FROM my_table AS n
) AS m
WHERE CONDITIONS
ORDER BY m.id DESC LIMIT 1
)
So for an example for #PIG,
WITH my_table AS (
SELECT column1 AS id
,column2 AS con1
,column3 AS other
FROM VALUES (1,'a',123),(2,'b',234),(3,'a',345),(5,'b',456),(7,'a',567),(10,'c',678)
)
SELECT m.id
,m.next_id
,m.other
FROM (
SELECT n.* -- to satisfy CONDITIONS needs
,LEAD(n.id) OVER(ORDER BY n.id) as next_id
FROM my_table AS n
) AS m
WHERE m.con1 = 'b'
ORDER BY m.id DESC LIMIT 1;
gives 5, 7, 456 which is the last 'b' and the new row, and an extra value on my_table for entertainment purposes (and run on Snowflake to, which means I fixed the prior SQL also.)
This should work, it's pretty straightforward (easy), and it's good that you know records may not be stored in a ordered/consecutive fashion.
SELECT *
FROM my_table
WHERE id = (
SELECT min(id)
FROM my_table
WHERE id > (
SELECT max(id)
FROM my_table
WHERE CONDITION));

SQL Server query for all columns with group by and having

I'm wondering is there a way to query all columns with group by and having in SQL Server? For example, I have 6 columns, a, b,…,f, and this is something I want to get:
Select *
From table
Group by table.b, table.c
Having max(table.d)=table.d
This works in sybase, since I'm trying to migrate stuff from sybase to SQL Server, I'm not sure what I can do in new environment. Thanks.
Why do you want to group by every column when you don't use any aggragate-functions in your select? Just use the following code to get all columns of the table:
select * from table
Group by only gets used when you have aggragete-functions (e.g. max(), avg(), count(), ...) in your select.
Having limits the aggrageted columns and where the normal columns of the table.
You can use MIN, MAX, AVG, and COUNT functions with the OVER clause to provide aggregated values for each column (to imitate the group by clause for each column) and Common table expression CTE to filter out the results (to imitate the having clause) as:
;With CTE as
(
SELECT
MIN(a) OVER (PARTITION BY a) AS MinCol_a
, MAX(b) OVER (PARTITION BY b) AS MaxCol_b
, AVG(c) OVER (PARTITION BY c) AS AvgCol_c
, COUNT(e) OVER (PARTITION BY d) AS Counte_PerCol_d
FROM Tbl_Test
)
select MinCol_a,MaxCol_b ,AvgCol_c,Counte_PerCol_d
from CTE
Join --here you can join the table Test results with other tables
where --any filter condition similar to Having clause
If what you want is to get the rows with maximum d for each combination of b and c then use NOT EXISTS:
select t.* from tablename t
where not exists (
select 1 from tablename
where b = t.b and c = t.c and d > t.d
)
or with rank() window function:
select t.a, t.b, t.c, t.d, t.e, t.f
from (
select *,
rank() over (partition by b, c order by d desc) rn
from tablename
) t
where t.rn = 1
Without using having you can get the result which you want. Try below
Select table.b, table.c, max(table.d)
From table
Group by table.b, table.c

Aggregate two columns and rows into one

I have the following table structure
start|end
09:00|11:00
13:00|14:00
I know
SELECT ARRAY_AGG(start), ARRAY_AGG(end)
Will result in
start|end
[09:00,13:00]|[11:00,14:00]
But how can i get the following result?
result
[09:00,11:00,13:00,14:00]
BTW, I'm using Postgres
You could do array concatenation (if order is not important):
SELECT ARRAY_AGG(start) || ARRAY_AGG(end) FROM TABLE1
If order is important you could use Gordon's approach but:
add aggregate order array_agg(d order by d ASC)
use unnest instead of union all, because Gordon's solution (union all) performs two sequence scan. If table is big it could be better for performance to use:
SELECT array_agg(d ORDER BY d ASC) FROM(
SELECT unnest(ARRAY[start] || ARRAY[end]) as d from table1
) sub
which performs only one sequence scan on table (and will be faster).
One method is to unpivot them and then aggregate:
select array_agg(d)
from (select start as d from t
union all
select end as d from t
) t;
A similar method uses a cross join:
select array_agg(case when n.n = 1 then t.start else t.end end)
from t cross join
(select 1 as n union all select 2) n;
I assume the start and end are character type
select ARRAY_AGG(col)
from(select string_agg(strt::text||','||en::text,',') col
from b
)t

SELECT on two other queries in Oracle

So, lets say I want to do something like:
SELECT Query1.a,
Query2.b
FROM (
SELECT q as a
FROM somewhere
),
(
SELECT g as b
FROM elsewhere
)
where Query 1 is
(
SELECT q as a
FROM somewhere
)
and Query2 is
(
SELECT g as b
FROM elsewhere
)
So, i want to select from two other select statements.
Query 1 produces a table
a
value1
Query 2 produces a table
b
value 2
And Query 3 (the outer select statement) produces
a b
value 1 value 2
So, essentially, two result tables are combined as columns and not as rows.
Thank you, if you have any hints.
You basically have your solution. You are only missing the names of your queries, so do like this:
SELECT Query1.a,
Query2.b
FROM (
SELECT q as a
FROM somewhere
) Query1,
(
SELECT g as b
FROM elsewhere
) Query2
It's not clear how you need to connect different rows from tables but it can be something like this:
select query1.a,
query2.b
FROM
(select q as a, ROW_NUMBER() OVER (ORDER BY q) as RN from a) Query1
FULL JOIN
(select q as b, ROW_NUMBER() OVER (ORDER BY q) as RN from b) Query2
ON Query1.RN=Query2.RN
SQLFiddle example
Your syntax is a bit off the SQL charts, but in essence ritgh:
It is possible to do a subquery:
select A.field from (select field from a_table) A;
It is essential that you name your query, if you want to use it in the select or where clauses.
And even possible to combine them like regular tables:
select A.field, B.other_field from (select field from table1) A, (select other_field from table2) B;
It is also possible to do al kind of where, grouping and sorting stuff on it, but not needed in your case.
I assume this is what you're looking for:
SELECT query1.a, query2.b
FROM
(SELECT q as a FROM somewhere) query1,
(SELECT g as b FROM elsewhere) query2
Here is a SQLFiddle to test the query

How to use group by with union in T-SQL

How can I using group by with union in T-SQL? I want to group by the first column of a result of union, I wrote the following SQL but it doesn't work. I just don't know how to reference the specified column (in this case is 1) of the union result.
SELECT *
FROM ( SELECT a.id ,
a.time
FROM dbo.a
UNION
SELECT b.id ,
b.time
FROM dbo.b
)
GROUP BY 1
You need to alias the subquery. Thus, your statement should be:
Select Z.id
From (
Select id, time
From dbo.tablea
Union All
Select id, time
From dbo.tableb
) As Z
Group By Z.id
GROUP BY 1
I've never known GROUP BY to support using ordinals, only ORDER BY. Either way, only MySQL supports GROUP BY's not including all columns without aggregate functions performed on them. Ordinals aren't recommended practice either because if they're based on the order of the SELECT - if that changes, so does your ORDER BY (or GROUP BY if supported).
There's no need to run GROUP BY on the contents when you're using UNION - UNION ensures that duplicates are removed; UNION ALL is faster because it doesn't - and in that case you would need the GROUP BY...
Your query only needs to be:
SELECT a.id,
a.time
FROM dbo.TABLE_A a
UNION
SELECT b.id,
b.time
FROM dbo.TABLE_B b
Identifying the column is easy:
SELECT *
FROM ( SELECT id,
time
FROM dbo.a
UNION
SELECT id,
time
FROM dbo.b
)
GROUP BY id
But it doesn't solve the main problem of this query: what's to be done with the second column values upon grouping by the first? Since (peculiarly!) you're using UNION rather than UNION ALL, you won't have entirely duplicated rows between the two subtables in the union, but you may still very well have several values of time for one value of the id, and you give no hint of what you want to do - min, max, avg, sum, or what?! The SQL engine should give an error because of that (though some such as mysql just pick a random-ish value out of the several, I believe sql-server is better than that).
So, for example, change the first line to SELECT id, MAX(time) or the like!
with UnionTable as
(
SELECT a.id, a.time FROM dbo.a
UNION
SELECT b.id, b.time FROM dbo.b
) SELECT id FROM UnionTable GROUP BY id