Redshift sample from table based on count of another table - sql

I have TableA of, say, 3000 rows (could be any number < 10000). I need to create TableX with 10000 rows. So I need to select random 10000 - (number of rows in TableA) from TableB (and add in TableA as well) to create TableX. Any ideas please?
Something like this (which obviously wouldnt work):
Create table TableX as
select * from TableA
union
select * from TableB limit (10000 - count(*) from TableA);

You could use union all and window functions. You did not list the table columns, so I assumed col1 and col2:
insert into tableX (col1, col2)
select col1, col2 from table1
union all
select t2.col1, t2.col2
from (select t2.*, row_number() over(order by random()) from table2 t2) t2
inner join (select count(*) cnt from table1) t1 on t2.rn <= 10000 - t1.cnt
The first query in union all selects all rows from table1. The second query assigns random row numbers to rows in table2, and then selects as many rows as needed to reach a total of 10000.
Actually it might be simpler to select all rows from both tables, then order by and limit in the outer query:
insert into tableX (col1, col2)
select col1, col2
from (
select col1, col2, 't1' which from table1
union all
select col1, col2, 't2' from table2
) t
order by which, random()
limit 10000

with inparms as (
select 10000 as target_rows
), acount as (
select count(*) as acount, inparms.target_rows
from tablea
cross join inparms
), btag as (
select b.*, 'tableb' as tabsource,
row_number() over (order by random()) as rnum
from tableb
)
select a.*, 'tablea', row_number() over (order by 1) as rnum
from tablea
union all
select b.*
from btag b
join acount a on b.rnum <= a.target_rows - a.acount
;

Related

SQL with having statement now want complete rows

Here is a mock table
MYTABLE ROWS
PKEY 1,2,3,4,5,6
COL1 a,b,b,c,d,d
COL2 55,44,33,88,22,33
I want to know which rows have duplicated COL1 values:
select col1, count(*)
from MYTABLE
group by col1
having count(*) > 1
This returns :
b,2
d,2
I now want all the rows that contain b and d. Normally, I would use where in stmt, but with the count column, not certain what type of statement I should use?
maybe you need
select * from MYTABLE
where col1 in
(
select col1
from MYTABLE
group by col1
having count(*) > 1
)
Use a CTE and a windowed aggregate:
WITH CTE AS(
SELECT Pkey,
Col1,
Col2,
COUNT(1) OVER (PARTITION BY Col1) AS C
FROM dbo.YourTable)
SELECT PKey,
Col1,
Col2
FROM CTE
WHERE C > 1;
Lots of ways to solve this here's another
select * from MYTABLE
join
(
select col1 ,count(*)
from MYTABLE
group by col1
having count(*) > 1
) s on s.col1 = mytable.col1;

sql sum of sums of two tables

how to get a sum of two sums from two tables?
SELECT (SELECT SUM(col) FROM table1) + SELECT (SUM(col) from table2)
doesn't work
You are very close. You just need parens around each subquery:
SELECT (SELECT SUM(col) FROM table1) + (SELECT SUM(col) from table2)
If either subquery could return NULL, you might prefer:
SELECT COALESCE(t1.s, 0) + COALESCE(t2.s)
FROM (SELECT SUM(col) as s FROM table1) t1 CROSS JOIN
(SELECT SUM(col) as s from table2) t2;
Due to this link, you can do that by :
SELECT T1C1 , T2C1
FROM
( select SUM(Col1) T1C1 FROM T1 ) A
CROSS JOIN
( select SUM(Col1) T2C1 FROM T2 ) B
also you can visit these links:
Query SUM for two fields in two different tables
Getting the sum of several columns from two tables
SQL: How to to SUM two values from different tables
select coalesce(sum(x),0) from
(
Select sum(a) x from tab1
Union all
Select sum(b) from tab2
) Ilv

Getting only unique values in postresql

I need something like:
SELECT * FROM TABLE WHERE <value in column1 is always unique
(if ever any value will be noticed more than once, then skip this row)>
in postgresql.
So if I have these rows in table:
1;"something";"xoxox"
2;"other";"xoxox"
3;"something";"blablabla"
And then go with the query, then that should be result:
2;"other";"xoxox"
Any ideas?
Use count(*) as a window function:
select t.*
from (select t.*, count(*) over (partition by col1) as cnt
from t
) t
where cnt = 1;
Alternatively, you can use not exists and the id column:
select t.*
from t
where not exists (select 1 from t t2 where t2.col1 = t.col1 and t2.id <> t.id);
You can filter over count without the need of a subquery:
SELECT t.col1
FROM t
GROUP BY col1
HAVING COUNT(*) = 1
Other columns can be added by using an aggregation function like max, since there will only be 1 row per value:
SELECT t.col1, max(t.col2), max(t.col3)
FROM t
GROUP BY col1
HAVING COUNT(*) = 1

How to select non-distinct rows with a distinct on multiple columns

I have found many answers on selecting non-distinct rows where they group by a singular column, for example, e-mail. However, there seems to have been issue in our system where we are getting some duplicate data whereby everything is the same except the identity column.
SELECT DISTINCT
COLUMN1,
COLUMN2,
COLUMN3,
...
COLUMN14
FROM TABLE1
How can I get the non-distinct rows from the query above? Ideally it would include the identity column as currently that is obviously missing from the distinct query.
select COLUMN1,COLUMN2,COLUMN3
from TABLE_NAME
group by COLUMN1,COLUMN2,COLUMN3
having COUNT(*) > 1
With _cte (col1, col2, col3, id) As
(
Select cOl1, col2, col3, Count(*)
From mySchema.myTable
Group By Col1, Col2, Col3
Having Count(*) > 1
)
Select t.*
From _Cte As c
Join mySchema.myTable As t
On c.col1 = t.col1
And c.col2 = t.col2
And c.col3 = t.col3
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY COL 1, COL 2, .... COL N ORDER BY COL M
) RN
FROM TABLE_NAME
)T
WHERE T.RN>1

Select only when first select is null

I really dont get this, tried with coalesce() but with no result...
I have one select (very simplified to understand the problem):
select col1,
col2
from table1 t1
where table1.col1='value'
and table1.col2='value2'
and table1.col3='value3'
And i really need a result so if this select resultset is null (and only if it is null) (no result) then the following sql select came to picture
select col1,
col2
from table1 t1
where table1.col1='another_value'
and table1.col2='another_value2'
How can i get this two in to one big select? (any syntax which is recommended is appreciated...)
Something like:
; WITH Base AS (
select col1,
col2
from table1 t1
where table1.col1='value'
and table1.col2='value2'
and table1.col3='value3'
)
, Base2 AS (
select col1,
col2
from table1 t1
where table1.col1='another_value'
and table1.col2='another_value2'
AND NOT EXISTS (SELECT 1 FROM Base) -- HERE!!!
)
SELECT * FROM Base
UNION
SELECT * FROM Base2
and let's hope the SQL optimizer won't run the first query twice :-)
It is a CTE (Common Table Expression)... I used it so I could reuse the first query twice (one in the EXISTS and the other in the SELECT ... UNION)
By using a temporary table
select col1,
col2
INTO #temp1 -- HERE!!!
from table1 t1
where table1.col1='value'
and table1.col2='value2'
and table1.col3='value3'
select col1,
col2
from table1 t1
where table1.col1='another_value'
and table1.col2='another_value2'
AND NOT EXISTS (SELECT 1 FROM #temp1) -- HERE!!!
It could benefit us a little better if you had a little more information in your example. Is there a common value between the two tables that a JOIN can be established?
SELECT col1
,col2
FROM Table1 t1
WHERE table1.col1='value'
and table1.col2='value2'
and table1.col3='value3'
UNION
SELECT col1
,col2
FROM Table2 t2
WHERE table1.col1='another_value'
and table1.col2='another_value2'
WHERE NOT EXISTS (SELECT 1 FROM Table1 t1 WHERE t1.Col1 = t2.Col2)
You can use COALESCE, like this:
select COALESCE (
(select col1,
col2
from table1 t1
where table1.col1='value'
and table1.col2='value2'
and table1.col3='value3')
,
(select col1,
col2
from table1 t1
where table1.col1='another_value'
and table1.col2='another_value2')
)
Here are my ugly solution.
select top 1 with ties
col1,
col2
from table1
where (
col1='value'
and col2='value2'
and col3='value3'
) OR
(
col1='another_value'
and col2='another_value2'
)
order by
CASE
WHEN col1='value'
and col2='value2'
and col3='value3'
THEN 1
WHEN col1='another_value'
and col2='another_value2'
THEN 2 END
SQL Fiddle DEMO