Postgres: pull two non-overlapping sets of data - sql

I would like to pull 30K rows at random from our data store to create one data set, then 30K more rows for a second data set that doesn't overlap any ids with the first.
My idea for how to make it would would be to somehow reference the id columns pulled in the first subquery when drawing a second subquery, then return their union:
SELECT * FROM (
SELECT id_col, A, B, C, 'group1' as label
FROM my_db
LIMIT 30000
) as t1
UNION ALL
(
SELECT id_col, A, B, C, 'group2' as label
FROM my_db
WHERE id_col NOT IN t1.id_col
LIMIT 30000
) as t2
But this does not work as I get "syntax error at or near t1" .
Updated: add label to column to show how a union would have created a tall format for the two groups.

As klin pointed out in the comments, you would need to use a Common Table Expression (CTE) in order to achieve your desired result:
WITH t1 AS (
SELECT id_col, A, B, C, 'group1' AS label
FROM my_db
LIMIT 30000
), t2 AS (
SELECT id_col, A, B, C, 'group2' AS label
FROM my_db
WHERE id_col NOT IN (SELECT id_col FROM t1)
LIMIT 30000
)
SELECT id_col, A, B, C, label
FROM t1
UNION
SELECT id_col, A, B, C, label
FROM t2
You also would not need to do a UNION ALL, a UNION should suffice.

Related

Avoid duplicate columns select on nested select

In CrateDB is there a way to avoid to re-select the same column in nested SELECT statement, to show the value in the results?
e.i. in the following query, is there any way to avoid re-selecting A and B through the nested SELECT? Ideally would be nice to select just once in the first
SELECT
A,
B,
AB,
A * B * AB AS ABAB,
A / AB::DECIMAL AS AAB,
FROM (
SELECT
A,
B,
(A + B) AS AB,
FROM (
SELECT
(SELECT count(*) FROM schema.table_01 WHERE process_state IN ('State_1')) AS A,
(SELECT count(*) FROM schema.table_01 WHERE process_state IN ('State_2')) AS B,
) alias_for_subquery_01
) alias_for_subquery_02
Thanks

Gridgain SQL query not working with union all and order by

SELECT a, b
FROM "table1".table1 table1
ORDER BY a DESC
UNION ALL
SELECT a, b
FROM "table1".table1 table1
ORDER BY a ASC
This query does not work. Individually, union all works and order by works, but does not work when together. Can someone please help?
Answered this question on GridGain forum
This should work:
SELECT a, b, b, NULL FROM "table1".table1 table1
UNION ALL
SELECT a, b, NULL, b FROM "table1".table1 table1
ORDER BY 3 DESC, 4 ASC
You can sort in an outer query:
select a, b
from (
select a, b, 0 x from table1
union all select a, b, 1 from table1
) t
order by
x,
case when x = 0 then a end desc,
case when x = 1 then a end
Individual query won't support order by if we are using UNION.
Try to remove order by in separate query and after union they to perform order by then it will work.

Which select is faster in Sqlite?

a is a table index, b is a normal column.
select a,b from ( select a,b from table where a in (*listA*) ) where b in (*listB*)
or
select a,b from table where (a=listA[0] and b=listB[0]) or (a=listA[1] and b=listB[1])...
I am using pseudocode to represent a list declaration.
The first query is wrong because it never looks at the combinations of a and b.
To use a temp table, you have to join with it:
SELECT a, b
FROM MyTable
JOIN (SELECT 1 AS a, 2 AS b UNION ALL
SELECT 3, 4 UNION ALL
SELECT 5, 6 ...)
USING (a, b)
Which version is better optimized depends on too many factors; the only way to find out is to measure with representative data.

How to select duplicate records without a primary key in SQL Server

If I run this query:
SELECT
a,
b,
c,
...
FROM [DMS].[dbo].[CreditDebitAdjustment]
I get 24197 records.
If I run this query:
SELECT DISTINCT
a,
b,
c,
...
FROM [DMS].[dbo].[CreditDebitAdjustment]
I get 24176 records.
How do I go about selecting only the rows that are identical?
SELECT
a,
b,
c
FROM [DMS].[dbo].[CreditDebitAdjustment]
group by a,b,c
having count(*) > 1
If you want to delete those duplicates, use
;WITH CTE AS
(
SELECT
a, b, c,
RowNum = ROW_NUMBER() OVER(PARTITION BY a,b,c ORDER BY ...(define how to order those rows)..)
FROM
[DMS].[dbo].[CreditDebitAdjustment]
)
DELETE FROM CTE
WHERE RowNum > 1
This "partitions" (groups) all your data by the tuple (a,b,c) and gives each row a number - starting at 1 for each new tuple.
So any cases where you have a RowNum that's larger than 1 - that's a duplicate, and I delete it away.
But really: any serious data table ought to have a proper primary key!

In SQL Server, what's the best way to merge tables from multiple databases?

I'm sorry that I can't find a better title of my question. Not lemme describe it in detail.
I have 4 database which are a, b, c and d. Database a have all table's that appear in b, c and d, and they have the same structure with the same constraints(pk, fk, default, check). b, c,d just have some tables that appear in a. Now there already some data in a, b, c and d. In b, c,d there are more data than the counterparts in a. And probably a have duplicated data with b, c,d.
Now what I want to do is export all data in b, c,d and import them to a. I already have a solution but I want to know what is the best method to do such a complicated task.
Thanks.
The UNIONs (no ALL) in the subquery will remove duplicates. Then the IS NULL in the Where will only insert new rows into Table1.
Insert Into DatabaseA.dbo.Table1(ID, Value)
Select ID, Value
FROM (
Select ID, Value From DatabaseB.dbo.Table1
UNION
Select ID, Value From DatabaseC.dbo.Table1
UNION
Select ID, Value From DatabaseD.dbo.Table1
) T
LEFT JOIN DatabaseA.dbo.Table1 S ON T.ID = S.ID
WHERE S.ID IS NULL
You can perform a Insert Into statement with the use of a unions that obtains the results from other databases
Insert Into dboTableA(ID, Value)
Select ID, Value From dbo.DatabaseB.TableA
UNION AlL
Select ID, Value From dob.DatabaseC.TableA
UNION ALL
Select ID, Value From dbo.DatabaseD.TableA