Populate "Lookup Table" with random values - sql

I have three tables, A B and C. For every entry in A x B (where x is a Cartesian product, or cross join) there is an entry in C.
In other words, the table for C might look like this, if there were 2 entries for A and 3 for B:
| A_ID | B_ID | C_Val |
----------------------|
| 1 | 1 | 100 |
| 1 | 2 | 56 |
| 1 | 3 | 19 |
| 2 | 1 | 67 |
| 2 | 2 | 0 |
| 2 | 3 | 99 |
Thus, for any combination of A and B, there's a value to be looked up in C. I hope this all makes sense.
In practice, the size of A x B may be relatively small for a database, but far too large to populate by hand for testing data. Thus, I would like to randomlly populate C's table for whatever data may already be in A and B.
My knowledge of SQL is fairly basic. What I've determined I can do so far is get that cartesian product as an inner query, like so:
(SELECT B.B_ID, C.C_ID
FROM B CROSS JOIN C)
Then I want to say something like follows:
INSERT INTO A(B_ID, C_ID, A_Val) VALUES
(SELECT B.B_ID, C.C_ID, FLOOR(RAND() * 100)
FROM B CROSS JOIN C)
Not surprisingly, this doesn't work. I don't think its valid syntax to genereate a column on the fly like that, nor to try to insert a whole table as values.
How can I basically convert this normal programming pseudocode to proper SQL?
foreach(A_ID in A){
foreach(B_ID in B){
C.insert(A_ID, B_ID, Rand(100));
}
}

The syntax problem is because:
INSERT INTO A(B_ID, C_ID, A_Val) VALUES
(SELECT B.B_ID, C.C_ID, FLOOR(RAND() * 100)
FROM B CROSS JOIN C)
Should be:
INSERT INTO A(B_ID, C_ID, A_Val)
SELECT B.B_ID, C.C_ID, FLOOR(RAND() * 100)
FROM B CROSS JOIN C;
(You don't use VALUES with INSERT/SELECT.)
However you will still have the problem that RAND() is not evaluated for every row; it will have the same value for every row. Assuming the combination of B_ID and C_ID is unique, you can use something like this:
INSERT INTO A(B_ID, C_ID, A_Val)
SELECT B.B_ID, C.C_ID, ABS(CHEKSUM(RAND(B.B_ID*C.C_ID))) % 100
FROM B CROSS JOIN C;

select A_id,B_Id, abs(checksum(newid()))%101 as C_val from A cross join B
This will give you different values in ranmge 0 to 100

Use CTE
With cte as
(SELECT B.B_ID, C.C_ID, ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) as A_Val
FROM B CROSS JOIN C)
Insert into Table(B_ID, C_ID, A_Val)
Select B_ID,C_ID,A_Val from cte
Since rand generates the same number you can use NEWID .Source

Related

Query to select rows with minimum distinct value of a column

I need to select row with minimum value of column B for each row of column A but it should be distinct from the other values that so far have been selected for column A. So the order of A maters. Also if the B is used up and none is left then the later values for A should be NULL or not appearing in the result.
Both A and B are numerical (or time stamp).
example:
A | B |
----+---+
1 | 3 |
1 | 5 |
1 | 6 |
2 | 3 |
2 | 5 |
9 | 3 |
9 | 5 |
So the desired result is:
A | B |
----+---+
1 | 3 |
2 | 5 |
select A, min(B) group by A obviously doesn't work because I don't want B to be repeated. Distinct also doesn't work because the rows are already distinct. I couldn't really find any question similar to this anywhere.
The actual data I am working with is the database of timeseries on redshift so A and B are timestamps. CTE's would be specifically welcome.
First I thought this could be solved with ROW_NUMBER () OVER (ORDER PARTITION BY B DESC) however there is a problem, the numbers in B should not be repeated.
At the moment the only thing that comes to mind is to make temporary tables, I know this is not the best way, but you can probably improve it
DECLARE #Tabla1 TABLE(A INT)
DECLARE #Tabla2 TABLE(B INT)
DECLARE #Tabla3 TABLE(A INT, B INT)
INSERT INTO #Tabla1 SELECT DISTINCT A FROM PRUEBA
WHILE (SELECT COUNT(*) FROM #Tabla1) > 0
BEGIN
DECLARE #A INT, #B INT;
SET #A = (SELECT TOP 1 * FROM #Tabla1);
SET #B = (SELECT MIN(B) FROM PRUEBA WHERE A = #A AND B NOT IN(SELECT * FROM #Tabla2));
INSERT INTO #Tabla2 VALUES (#B)
DELETE FROM #Tabla1 WHERE A = #A
INSERT INTO #Tabla3 SELECT A, B FROM PRUEBA WHERE A = #A AND B = #B
END
SELECT * FROM #Tabla3
Maybe you can use a cursor, but you would have to be calculated that takes more computational expense, the cursor or the temporary tables
This is basically a "find the diagonal" problem. You need to know the rank of B within A and the rank of A within all. I believe this works for the data given:
select A, B from (
select row_number() over (partition by A order by B) as RN,
dense_rank() over (order by A) as DR.
A, B
from <table> )
where RN = DR;
For more complex cases this solution will get more complex.
Addendum:
Because I know it will be asked and this is an interesting problem, I worked out what such a more complex solution would look like:
select min(A) as A, B from (
select decode(A <> nvl(min(A) over (order by DRB, DRA rows between unbounded preceding and 1 preceding),-1), true, 'good', 'no good') as Y,
A, B from (
select dense_rank() over (partition by B order by A) as DRA,
dense_rank() over ( order by B) as DRB,
A, B from <table>
)
where DRA <= DRB
)
where Y = 'good'
group by B
order by A, B;

Get count of foreign key from multiple tables

I have 3 tables, with Table B & C referencing Table A via Foreign Key. I want to write a query in PostgreSQL to get all ids from A and also their total occurrences from B & C.
a | b | c
-----------------------------------
id | txt | id | a_id | id | a_id
---+---- | ---+----- | ---+------
1 | a | 1 | 1 | 1 | 3
2 | b | 2 | 1 | 2 | 4
3 | c | 3 | 3 | 3 | 4
4 | d | 4 | 4 | 4 | 4
Output desired (just the id from A & total count in B & C) :
id | Count
---+-------
1 | 2 -- twice in B
2 | 0 -- occurs nowhere
3 | 2 -- once in B & once in C
4 | 4 -- once in B & thrice in C
SQL so far SQL Fiddle :
SELECT a_id, COUNT(a_id)
FROM
( SELECT a_id FROM b
UNION ALL
SELECT a_id FROM c
) AS union_table
GROUP BY a_id
The query I wrote fetches from B & C and counts the occurrences. But if the key doesn't occur in B or C, it doesn't show up in the output (e.g. id=2 in output). How can I start my selection from table A & join/union B & C to get the desired output
If the query involves large parts of b and / or c it is more efficient to aggregate first and join later.
I expect these two variants to be considerably faster:
SELECT a.id,
, COALESCE(b.ct, 0) + COALESCE(c.ct, 0) AS bc_ct
FROM a
LEFT JOIN (SELECT a_id, count(*) AS ct FROM b GROUP BY 1) b USING (a_id)
LEFT JOIN (SELECT a_id, count(*) AS ct FROM c GROUP BY 1) c USING (a_id);
You need to account for the possibility that some a_id are not present at all in a and / or b. count() never returns NULL, but that's cold comfort in the face of LEFT JOIN, which leaves you with NULL values for missing rows nonetheless. You must prepare for NULL. Use COALESCE().
Or UNION ALL a_id from both tables, aggregate, then JOIN:
SELECT a.id
, COALESCE(ct.bc_ct, 0) AS bc_ct
FROM a
LEFT JOIN (
SELECT a_id, count(*) AS bc_ct
FROM (
SELECT a_id FROM b
UNION ALL
SELECT a_id FROM c
) bc
GROUP BY 1
) ct USING (a_id);
Probably slower. But still faster than solutions presented so far. And you could do without COALESCE() and still not loose any rows. You might get occasional NULL values for bc_ct, in this case.
Another option:
SELECT
a.id,
(SELECT COUNT(*) FROM b WHERE b.a_id = a.id) +
(SELECT COUNT(*) FROM c WHERE c.a_id = a.id)
FROM
a
Use left join with a subquery:
SELECT a.id, COUNT(x.id)
FROM a
LEFT JOIN (
SELECT id, a_id FROM b
UNION ALL
SELECT id, a_id FROM c
) x ON (a.id = x.a_id)
GROUP BY a.id;

SQL query - select uncommon values from 2 tables

Today I was asked following question in my interview for a QA and because of incorrect query, I did not get selected. From then on, my mind is itching to get the correct answer for the following scenario:
I was given following 2 tables:
Tabel A | |Table B
--------- ----------
**ID** **ID**
-------- -----------
0 | | 5 |
1 | | 6 |
2 | | 7 |
3 | | 8 |
4 | | 9 |
5 | | 10|
6 | -----
----
And following output was expected using an SQL query:
**ID**
--------
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 7 |
| 8 |
| 9 |
| 10 |
--------
Thanks everyone, I really like this forum and from now on will be active here to learn more and more about SQL. I would like to make it my strong point rather a weak so as not to get kicked out of other interviews. I know there is a long way to go. However beside all of your responses, I came to draft the following query and would like to know from the experts here of their opinion about my query (and the reason why they think of what they think):
BTW the query has worked on MSSQLSRV-2008 (using Union or Union All, didn't matter to the result I got):
select ID from A where ID not in (5,6)
union
select ID from B where ID not in (5,6)
Is this really an efficient query?
If you want values in only one of two tables, I would use a full outer join and condition:
select coalesce(a.id, b.id)
from tableA a full outer join
tableB b
on a.id = b.id
where a.id is null or b.id is null;
Of course, if the job at a company that uses MS Access or MySQL, then this isn't the right answer, because these systems don't support full outer join. You can also do this in more complicated ways using union all and aggregation or even with other methods.
EDIT:
Here is another method:
select id
from (select a.id, 1 as isa, 0 as isb from tablea union all
select b.id, 0, 1 from tableb
) ab
group by id
having sum(isa) = 0 or sum(isb) = 0;
And another:
select id
from tablea
where a.id not in (select id from tableb)
union all
select id
from tableb
where b.id not in (select id from tablea);
As I think about this, it is a pretty good interview question (even though I've just given three reasonable answers).
Edit: See Gordon answer above for a better request, this is very inneficient way of doing what you want.
I think this should do the trick :
(SELECT * FROM A WHERE NOT id IN (SELECT A.id FROM A, B WHERE A.id = B.id))
UNION
(SELECT * FROM B WHERE NOT id IN (SELECT A.id FROM A, B WHERE A.id = B.id))
You could avoid the duplication of SELECT A.id ...by using a temporary table.
without full outer joins...
Select id
from (Select id from tableA
Union all
Select id from tableB) Z
group by id
Having count(*) = 1
or using Except and Intersect .....
(Select id from tableA Except Select id from tableB)
Union
(Select id from tableB Except Select id from tableA)
or ....
(Select id from tableA union Select id from tableB)
Except
(Select id from tableA intersect Select id from tableB)

Distinct Values Ignoring Column Order

I have a table similar to:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 2 | 2 | 1 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
| 5 | 5 | 0 |
+----+---+---+
I want to remove all duplicate pairs of values, regardless of which column contains which value, e.g. after whatever the query might be I want to see:-
+----+---+---+
| Id | A | B |
+----+---+---+
| 1 | 1 | 2 |
+----+---+---+
| 3 | 3 | 4 |
+----+---+---+
| 4 | 0 | 5 |
+----+---+---+
I'd like to find a solution in Microsoft SQL Server (has to work in <= 2005, though I'd be interested in any solutions which rely upon >= 2008 features regardless).
In addition, note that A and B are going to be in the range 1-100 (but that's not guaranteed forever. They are surrogate seeded integer foreign keys, however the foreign table might grow to a couple hundred rows max).
I'm wondering whether I'm missing some obvious solution here. The ones which have occurred all seem rather overwrought, though I do think they'd probably work, e.g.:-
Have a subquery return a bitfield with each bit corresponding to one of the ids and use this value to remove duplicates.
Somehow, pivot, remove duplicates, then unpivot. Likely to be tricky.
Thanks in advance!
Test data and sample below.
Basically, we do a self join with an OR criteria so either a=a and b=b OR a=b and b=a.
The WHERE in the subquery gives you the max for each pair to eliminate.
I think this should work for triplicates as well (note I added a 6th row).
DECLARE #t table(id int, a int, b int)
INSERT INTO #t
VALUES
(1,1,2),
(2,2,1),
(3,3,4),
(4,0,5),
(5,5,0),
(6,5,0)
SELECT *
FROM #t
WHERE id NOT IN (
SELECT a.id
FROM #t a
INNER JOIN #t b
ON (a.a=b.a
AND a.b=b.b)
OR
(a.b=b.a
AND a.a = b.b)
WHERE a.id > b.id)
Try:
select min(Id) Id, A, B
from (select Id, A, B from DuplicatesTable where A <= B
union all
select Id, B A, A B from DuplicatesTable where A > B) v
group by A, B
order by 1
Not 100% tested and I'm sure it can be tidied up but it produces your required result:
DECLARE #T TABLE (id INT IDENTITY(1,1), A INT, B INT)
INSERT INTO #T
VALUES (1,2), (2,1), (3,4), (0,5), (5,0);
SELECT *
FROM #T
WHERE id IN (SELECT DISTINCT MIN(id)
FROM (SELECT id, a, b
FROM #T
UNION ALL
SELECT id, b, a
FROM #T) z
GROUP BY a, b)

Postgresql: Insert the cartesian product of two or more sets

as definition: The cartesian product of two sets is the set of all possible pairs of these sets, so {A,B} x {a,b} = {(A,a),(A,b),(B,a),(B,b)}.
Now i want to insert such a cartesian product into a database table (each pair as a row). It is intended to fill the table with default values for each pair, so the data, i.e. the two sets, are not present in the database at this point.
Any idea how to achieve this with postgresql?
EDIT :
With the help of Grzegorz Szpetkowski's answer I was able to produce a query that does what I want to achieve, but it really isn't the prettiest one. Suppose I want to insert the cartesian product of the sets {1,2,3} and {'A','B','C'}.
INSERT INTO "Test"
SELECT * FROM
(SELECT 1 UNION SELECT 2 UNION SELECT 3) P
CROSS JOIN
(SELECT 'A' UNION SELECT 'B' UNION SELECT 'C') Q
Is there any better way to do this?
EDIT2 :
Accepted answer is fine, but i found another version which might be appropriate if it gets more complex:
CREATE TEMP TABLE "Numbers" (ID integer) ON COMMIT DROP;
CREATE TEMP TABLE "Chars" (Char character varying) ON COMMIT DROP;
INSERT INTO "Numbers" (ID) VALUES (1),(2),(3);
INSERT INTO "Chars" (Char) VALUES ('A'),('B'),('C');
INSERT INTO "Test"
SELECT * FROM
"Numbers"
CROSS JOIN
"Chars";
I am not sure if this really answers your question, but in PostgreSQL there is CROSS JOIN defined as:
For every possible combination of rows from T1 and T2 (i.e., a
Cartesian product), the joined table will contain a row consisting of
all columns in T1 followed by all columns in T2. If the tables have N
and M rows respectively, the joined table will have N * M rows.
FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2. It is also
equivalent to FROM T1 INNER JOIN T2 ON TRUE (see below).
EDIT:
One way is to use VALUES Lists (note that in fact you have no order, use ORDER BY clause to get some ordering):
SELECT N AS number, L AS letter FROM
(VALUES (1), (2), (3)) a(N)
CROSS JOIN
(VALUES ('A'), ('B'), ('C')) b(L);
Result:
number | letter
--------+--------
1 | A
1 | B
1 | C
2 | A
2 | B
2 | C
3 | A
3 | B
3 | C
(9 rows)
BTW:
For more numbers I believe it's handle to use generate_series function, e.g.:
SELECT n AS number, chr(ascii('A') + L - 1) AS letter
FROM
generate_series(1, 5) N
CROSS JOIN
generate_series(1, 5) L
ORDER BY N, L;
Result:
number | letter
--------+--------
1 | A
1 | B
1 | C
1 | D
1 | E
2 | A
2 | B
2 | C
2 | D
2 | E
3 | A
3 | B
3 | C
3 | D
3 | E
4 | A
4 | B
4 | C
4 | D
4 | E
5 | A
5 | B
5 | C
5 | D
5 | E
(25 rows)