Generate random pairs SQL - sql

Suppose we have these two tables.
TABLE1:
|column_1 | ... |
--------------------
| 'a' | ... |
| 'b' | ... |
| 'c' | ... |
| 'd' | ... |
| 'e' | ... |
TABLE_2:
|column_1 | ... |
--------------------
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
| 5 | ... |
I want to pair all rows of TABLE_1 with some random columns from TABLE_2 where each pair is gonna have a random amount of distinct rows from TABLE_2 (range 1,2,3)
An output could be:
|column_1 | column_2 |
---------------------------
| 'a' | 1 |
| 'a' | 2 |
| 'a' | 5 |
| 'b' | 5 |
| 'c' | 3 |
| 'c' | 4 |
| 'd' | 3 |
| 'e' | 3 |
| 'e' | 5 |
| 'e' | 1 |

JOIN LATERAL
did the thing for me.
SELECT *
FROM TABLE1
LEFT JOIN LATERAL(
SELECT *
FROM TABLE2 LIMIT FLOOR(RANDOM() * 3 + 1)) a
ON TRUE

Related

SQL return only rows where value exists multiple times and other value is present

I have a table like this in MS SQL SERVER
+------+------+
| ID | Cust |
+------+------+
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 2 | A |
| 2 | A |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 3 | B |
| 3 | C |
| 3 | C |
+------+------+
I don't know the values in column "Cust" and I want to return all rows where the value of "Cust" appears multiple times and where at least one of the "ID" values is "1".
Like this:
+------+------+
| ID | Cust |
+------+------+
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 2 | A |
| 2 | A |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 3 | B |
+------+------+
Any ideas? I can't find it.
You may use COUNT window function as the following:
SELECT ID, Cust
FROM
(
SELECT ID, Cust,
COUNT(*) OVER (PARTITION BY Cust) cn,
COUNT(CASE WHEN ID=1 THEN 1 END) OVER (PARTITION BY Cust) cn2
FROM table_name
) T
WHERE cn>1 AND cn2>0
ORDER BY ID, Cust
COUNT(*) OVER (PARTITION BY Cust) to check if the value of "Cust" appears multiple times.
COUNT(CASE WHEN ID=1 THEN 1 END) OVER (PARTITION BY Cust) to check that at least one of the "ID" values is "1".
See a demo.

SQL: Get row number which increases every time a value changes

I have the following table in Vertica:
+----------+----------+----------+
| column_1 | column_2 | column_3 |
+----------+----------+----------+
| a | 1 | 1 |
| a | 2 | 1 |
| a | 3 | 1 |
| b | 1 | 1 |
| b | 2 | 1 |
| b | 3 | 1 |
| c | 1 | 1 |
| c | 2 | 1 |
| c | 3 | 1 |
| c | 1 | 2 |
| c | 2 | 2 |
| c | 3 | 2 |
+----------+----------+----------+
The table is ordered by column_1 and column_3.
I would like to add a row number, which increases every time when column_1 or column_3 change their value. It would look something like this:
+----------+----------+----------+------------+
| column_1 | column_2 | column_3 | row_number |
+----------+----------+----------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 1 | 1 | 2 |
| b | 2 | 1 | 2 |
| b | 3 | 1 | 2 |
| c | 1 | 1 | 3 |
| c | 2 | 1 | 3 |
| c | 3 | 1 | 3 |
| c | 1 | 2 | 4 |
| c | 2 | 2 | 4 |
| c | 3 | 2 | 4 |
+----------+----------+----------+------------+
I tried using partition over but I can't find the right syntax.
Vertica has the CONDITIONAL_CHANGE_EVENT() analytic functions.
It starts at 0, and increments by 1 every time the expression that makes the first argument undergoes a change.
Like so:
WITH
indata(column_1,column_2,column_3,rn) AS (
SELECT 'a',1,1,1
UNION ALL SELECT 'a',2,1,1
UNION ALL SELECT 'a',3,1,1
UNION ALL SELECT 'b',1,1,2
UNION ALL SELECT 'b',2,1,2
UNION ALL SELECT 'b',3,1,2
UNION ALL SELECT 'c',1,1,3
UNION ALL SELECT 'c',2,1,3
UNION ALL SELECT 'c',3,1,3
UNION ALL SELECT 'c',1,2,4
UNION ALL SELECT 'c',2,2,4
UNION ALL SELECT 'c',3,2,4
)
SELECT
*
, CONDITIONAL_CHANGE_EVENT(
column_1||column_3::VARCHAR
) OVER w + 1 AS rownum
FROM indata
WINDOW w AS (ORDER BY column_1,column_3,column_2)
;
-- out column_1 | column_2 | column_3 | rn | rownum
-- out ----------+----------+----------+----+--------
-- out a | 1 | 1 | 1 | 1
-- out a | 2 | 1 | 1 | 1
-- out a | 3 | 1 | 1 | 1
-- out b | 1 | 1 | 2 | 2
-- out b | 2 | 1 | 2 | 2
-- out b | 3 | 1 | 2 | 2
-- out c | 1 | 1 | 3 | 3
-- out c | 2 | 1 | 3 | 3
-- out c | 3 | 1 | 3 | 3
-- out c | 1 | 2 | 4 | 4
-- out c | 2 | 2 | 4 | 4
-- out c | 3 | 2 | 4 | 4
In the absence of an ORDER BY, SQL data sets are unordered. To establish the order in your example therefore, I've assumed the dataset can be sorted with ORDER BY column_1, column_3, column_2
If that assumption doesn't work, you MUST add additional columns that the data can be deterministically sorted by.
That gives the following query...
SELECT
yourTable.*,
DENSE_RANK() OVER (ORDER BY column_1, column_3) AS row_number
FROM
yourTable
ORDER BY
column_1, column_3, column_2
This would also work and doesn't require table sorting
Find distinct value from column_1 and column_3 and give new index for them
Merge the previous with origin table on column_1 and column_3
select t1.*, t2.row_number
from
your_table t1
join
(select column_1, column_2, row_number() over (partition by temp) as row_number from (select distinct column_1, column_2, 1 as temp from your_table) foo) t2
on
t1.column_1=t2.column_1 and t1.column_2=t2.column_2;

SQL based full text search for given args within group of rows

I'm trying to search for specific data in the database table (Oracle 12c). I want to search for specific texts in row groups. Each group have specific ID, so I would like to get ID of the group if all of the searching arguments can be found.
I prepared sample table but with some simplifications:
- In real table there is more than 20 columns and millions of rows.
- I converted real values to some shorter version like a or b, in real table there are VARCHAR(500) columns
- There can be thousands of rows in the same group (same ID)
- The search have to be fast, so manipulating too much of this data or many nested queries might not be an option
Sample Table:
+----+----+---+---+----+
| ID | A | B | C | D |
+----+----+---+---+----+
| 1 | aq | a | a | a |
| 1 | a | a | c | ad |
| 1 | a | a | a | a |
| 2 | a | a | a | a |
| 2 | a | a | a | a |
| 2 | a | a | a | a |
| 3 | a | a | a | a |
| 3 | a | a | a | a |
| 3 | a | d | a | a |
+----+----+---+---+----+
Sample Cases:
+------+-------------+-----------+
| Case | Searching | Expected |
+------+-------------+-----------+
| 1 | `q` and `c` | [1] |
| 2 | `a` and `d` | [1, 3] |
| 3 | `a` and `q` | [1] |
| 4 | `a` | [1, 2, 3] |
+------+-------------+-----------+
Case 1:
ID = 1 - matching q and c in two rows
Result = Row [1]
+----+----+---+---+----+
| ID | A | B | C | D |
+----+----+---+---+----+
| 1 | aq | a | a | a | <-- q
| 1 | a | a | c | ad | <-- c
| 1 | a | a | a | a |
| 2 | a | a | a | a |
| 2 | a | a | a | a |
| 2 | a | a | a | a |
| 3 | a | a | a | a |
| 3 | a | a | a | a |
| 3 | a | d | a | a |
+----+----+---+---+----+
Case 2:
ID = 2 - doesn't have d anywhere
Result: Rows [1, 3]
+----+----+---+---+----+
| ID | A | B | C | D |
+----+----+---+---+----+
| 1 | aq | a | a | a | <-- a
| 1 | a | a | c | ad | <-- a, d
| 1 | a | a | a | a | <-- a
| 2 | a | a | a | a | <-- a
| 2 | a | a | a | a | <-- a
| 2 | a | a | a | a | <-- a
| 3 | a | a | a | a | <-- a
| 3 | a | a | a | a | <-- a
| 3 | a | d | a | a | <-- a, d
+----+----+---+---+----+
Case 3:
ID = 1, matching q and c in single row
Result: Row [1]
+----+----+---+---+----+
| ID | A | B | C | D |
+----+----+---+---+----+
| 1 | aq | a | a | a | <-- a, q
| 1 | a | a | c | ad | <-- a
| 1 | a | a | a | a | <-- a
| 2 | a | a | a | a | <-- a
| 2 | a | a | a | a | <-- a
| 2 | a | a | a | a | <-- a
| 3 | a | a | a | a | <-- a
| 3 | a | a | a | a | <-- a
| 3 | a | d | a | a | <-- a
+----+----+---+---+----+
Case 4:
We have a everywhere
Result: Rows [1, 2, 3]
+----+----+---+---+----+
| ID | A | B | C | D |
+----+----+---+---+----+
| 1 | aq | a | a | a | <-- a
| 1 | a | a | c | ad | <-- a
| 1 | a | a | a | a | <-- a
| 2 | a | a | a | a | <-- a
| 2 | a | a | a | a | <-- a
| 2 | a | a | a | a | <-- a
| 3 | a | a | a | a | <-- a
| 3 | a | a | a | a | <-- a
| 3 | a | d | a | a | <-- a
+----+----+---+---+----+
Any help appreciated :), thanks
You could use listagg to:
Concatenate all the columns into one
Group the rows for each id into one string
Which gives:
create table t (
id int, a varchar2(2), b varchar2(1), c varchar2(1), d varchar2(2)
);
insert into t values (1, 'aq', 'a', 'a', 'a');
insert into t values (1, 'a', 'a', 'c', 'ad');
insert into t values (1, 'a', 'a', 'a', 'a');
insert into t values (2, 'a', 'a', 'a', 'a');
insert into t values (2, 'a', 'a', 'a', 'a');
insert into t values (2, 'a', 'a', 'a', 'a');
insert into t values (3, 'a', 'a', 'a', 'a');
insert into t values (3, 'a', 'a', 'a', 'a');
insert into t values (3, 'a', 'd', 'a', 'a');
commit;
with vals as (
select t.id,
listagg ( a || b || c || d )
within group ( order by a ) str
from t
group by t.id
)
select * from vals
where str like '%q%'
and str like '%c%';
ID STR
1 aaaaaacadaqaaa
with vals as (
select t.id,
listagg ( a || b || c || d )
within group ( order by a ) str
from t
group by t.id
)
select * from vals
where str like '%a%'
and str like '%d%';
ID STR
1 aaaaaacadaqaaa
3 aaaaaaaaadaa
Fair warning: This is likely to be slow!
You may be able to mitigate this by placing the listagg query in a materialized view.
Also with 20+ columns with some up to 500 characters long, it's likely you'll blow out the character limit for listagg. Unless you've enabled extended data types to allow 32,767 long varchar2s in SQL.
You can try the following code:
SELECT
ID
FROM
(
SELECT
ID,
RTRIM(XMLAGG(XMLELEMENT(E, A || B || C || D, ',').EXTRACT('//text()')).GETCLOBVAL(), ',')
AS CONSOLIDATED_VALUE
FROM
T
GROUP BY
ID
)
WHERE
CONSOLIDATED_VALUE LIKE '%q%'
AND CONSOLIDATED_VALUE LIKE '%c%'
Demo
Cheers!!

Find all IDs which have a row with value A AND a row with value B

I have a table like this:
+-----+-------+-----+
| id | value | ... |
+-----+-------+-----+
| 1 | A | ... |
| 1 | B | ... |
| 1 | C | ... |
| 2 | B | ... |
| 2 | C | ... |
| 3 | A | ... |
| 3 | C | ... |
| 4 | B | ... |
| 4 | A | ... |
| ... | ... | ... |
+-----+-------+-----+
I want to limit this to just ids that have both rows with A and rows with B in the value column. In this case, the table would look like this:
+-----+-------+-----+
| id | value | ... |
+-----+-------+-----+
| 1 | A | ... |
| 1 | B | ... |
| 1 | C | ... |
| 4 | B | ... |
| 4 | A | ... |
| ... | ... | ... |
+-----+-------+-----+
… because neither id 2 nor 3 had both A and B in the value column.
Is there a succinct way to locate these IDs?
select id, value
from t
where id in (
select id
from t
group by id
having bool_or(value = 'A') and bool_or(value = 'B')
)
or
select id, value
from t t0
where
exists (
select 1
from t
where id = t0.id and value = 'A'
) and
exists (
select 1
from t
where id = t0.id and value = 'B'
)
One way to do this is to count the distinct number of a/bs an id has:
SELECT *
FROM mytable
WHERE id IN (SELECT id
FROM mytable
WHERE value in ('a', 'b')
GROUP BY id
HAVING COUNT(DISTINCT value) = 2)

Select 5 of each distinct value

I have the following table in PostgreSQL:
| a | b | c |
===================
| 'w' | 2 | 3 |
| 'w' | 7 | 2 |
| 'w' | 8 | 1 |
| 'w' | 3 | 6 |
| 'w' | 0 | 8 |
| 'w' | 2 | 9 |
| 'w' | 2 | 9 |
| 'z' | 4 | 9 |
| 'z' | 0 | 9 |
| 'z' | 0 | 8 |
| 'z' | 3 | 6 |
| 'z' | 2 | 7 |
| 'z' | 3 | 1 |
| 'z' | 3 | 2 |
| 'z' | 3 | 3 |
I want to select all records, but limit them to 5 records for each distinct value in column a.
So the result would look like:
| a | b | c |
===================
| 'w' | 2 | 3 |
| 'w' | 7 | 2 |
| 'w' | 8 | 1 |
| 'w' | 3 | 6 |
| 'w' | 0 | 8 |
| 'z' | 4 | 9 |
| 'z' | 0 | 9 |
| 'z' | 0 | 8 |
| 'z' | 3 | 6 |
| 'z' | 2 | 7 |
What is the most effecient way to achieve that in RoR? Thanks!
you can use row_number, but you have to specify order or you will get unpredictable resutls
with cte as (
select
*,
row_number() over(partition by a order by b, c) as row_num
from table1
)
select a, b, c
from cte
where row_num <= 5