SQL UNION ALL only include newer entries from 'bottom' table - sql

Fair warning: I'm new to using SQL. I do so on an Oracle server either via AQT or with SQL Developer.
As I haven't been able to think or search my way to an answer, I put myself in your able hands...
I'd like to combine data from table A (high quality data) with data from table B (fresh data) such that the entries from B are only included when the date stamp are later than those available from table A.
Both tables include entries from multiple entities, and the latest date stamp varies with those entities.
On the 4th of january, the tables may look something like:
A____________________________ B_____________________________
entity date type value entity date type value
X 1.jan 1 1 X 1.jan 1 2
X 1.jan 0 1 X 1.jan 0 2
X 2.jan 1 1 X 2.jan 1 2
Y 1.jan 1 1 (new entry)X 3.jan 1 1
Y 3.jan 1 1 Y 1.jan 1 2
Y 3.jan 1 2
(new entry)Y 4.jan 1 1
I have made an attempt at some code that I hope clarify my need:
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
WHERE date > ALL (SELECT date FROM AA)
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)
Now, if the WHERE date > ALL (SELECT date FROM AA)would work seperately for each entity, I think have what I need.
That is, for each entity I want all entries from A, and only newer entries from B.
As the data in table A often differ from that of B (values are often corrected) I dont think I can use something like: table A UNION ALL (table B MINUS table A)?
Thanks

Essentially you are looking for entries in BB which do not exist in AA. When you are doing date > ALL (SELECT date FROM AA) this will not take into consideration the entity in question and you will not get the correct records.
Alternative is to use the JOIN and filter out all matching entries with AA.
Something like below.
WITH
AA AS
(SELECT entity, date, SUM(value)
FROM table_A
GROUP BY
entity,
date),
BB AS
(SELECT entity, date, SUM(value)
FROM table_B
LEFT OUTER JOIN AA
ON AA.entity = BB.entity
AND AA.DATE = BB.date
WHERE AA.date == null
GROUP BY
entity,
date
)
SELECT * FROM (SELECT * FROM AA UNION ALL SELECT * FROM BB)

I find your question confusing, because I don't know where the aggregation is coming from.
The basic idea on getting newer rows from table_b uses conditions in the where clause, something like this:
select . . .
from table_a a
union all
select . . .
from table_b b
where b.date > (select max(a.date) from a where a.entity = b.entity);
You can, of course, run this on your CTEs, if those are what you really want to combine.

Use UNION instead of UNION ALL , it will remove the duplicate records
SELECT * FROM (
SELECT *
FROM AA
UNION
SELECT *
FROM BB )

Related

SQL: select rows from a certain table based on conditions in this and another table

I have two tables that share IDs on a postgresql .
I would like to select certain rows from table A, based on condition Y (in table A) AND based on Condition Z in a different table (B) ).
For example:
Table A Table B
ID | type ID | date
0 E 1 01.01.2022
1 F 2 01.01.2022
2 E 3 01.01.2010
3 F
IDs MUST by unique - the same ID can appear only once in each table, and if the same ID is in both tables it means that both are referring to the same object.
Using an SQL query, I would like to find all cases where:
1 - the same ID exists in both tables
2 - type is F
3 - date is after 31.12.2021
And again, only rows from table A will be returned.
So the only returned row should be:1 F
It is a bit hard t understand what problem you are actually facing, as this is very basic SQL.
Use EXISTS:
select *
from a
where type = 'F'
and exists (select null from b where b.id = a.id and dt >= date '2022-01-01');
Or IN:
select *
from a
where type = 'F'
and id in (select id from b where dt >= date '2022-01-01');
Or, as the IDs are unique in both tables, join:
select a.*
from a
join b on b.id = a.id
where a.type = 'F'
and b.dt >= date '2022-01-01';
My favorite here is the IN clause, because you want to select data from table A where conditions are met. So no join needed, just a where clause, and IN is easier to read than EXISTS.
SELECT *
FROM A
WHERE type='F'
AND id IN (
SELECT id
FROM B
WHERE DATE>='2022-01-01'; -- '2022' imo should be enough, need to check
);
I don't think joining is necessary.

SQL SELECT repeating rows from table for specific time interval

I have table and I want to find repeating rows for specific time interval (DATE is input parameter for SQL query) where it will list all rows with the same PERSON and TYPE value.
ID DATE PERSON TYPE
1 01.01.2017 PERSON1 TYPE1
2 02.02.2017 PERSON1 TYPE1
3 03.03.2017 PERSON2 TYPE1
4 04.04.2017 PERSON2 TYPE2
5 05.05.2017 PERSON2 TYPE1
6 06.06.2017 PERSON1 TYPE2
So for example if DATE is between 01.01 and 04.04 it should list me rows with ID 1 and 2.
If DATE is between 01.01 and 06.06 it should list me rows with ID 1, 2, 3 and 5 because 1 and 2 have the same person and type in that interval and 3 and 5 have the same person and type in that interval.
SELECT ID FROM TABLE
WHERE DATE>='01.01.2017' AND DATE<='06.06.2017'
but I am not sure even how to start to define this repeating clause based on PERSON and TYPE columns.
Maybe can INNER JOIN help with this if referencing the same table and matching those two columns and third column ID is different?: TABLE.PERSON=TABLE.PERSON and TABLE.TYPE=TABLE.TYPE and TABLE.ID!=TABLE.ID of course table is the same but different alias can be used for this?
Please try...
SELECT ID AS ID
FROM tableName
JOIN
(
SELECT person AS person,
type AS type,
COUNT( person ) AS countOfPair
FROM tableName
WHERE date BETWEEN startDate AND endDate
GROUP BY person,
type
) tempTable ON tableName.person = tempTable.person AND
tableName.type = tempTable.type
WHERE countOfPair >= 2
The inner SELECT gathers each combination of person and type in between your start and end dates (please replace startDate and endDate with however you are referencing those) and performs a count of them.
The outer SELECT statement's JOIN then has the effect of appending the count of each combination to the end of each row containing that combination. The outer SELECT then retrieves the ID from each row that has a repeated combination.
If you have any questions or comments, then please feel free to post a Comment accordingly.
You can try this (I don't know if your version has window analytic function):
(X is the name of your table)
SELECT Y.ID, Y.DATE, Y.PERSON, Y.TYPE
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY PERSON, TYPE) AS RC
FROM X
WHERE DATE >='01.01.2017' AND DATE <='04.04.2017'
) Y
WHERE RC>1
Or this if it doesn't support them:
SELECT X.ID, X.DATE, X.PERSON, X.TYPE
FROM X
INNER JOIN (
SELECT PERSON, TYPE, COUNT(*) AS RC
FROM X
WHERE DATE >='01.01.2017' AND DATE <='04.04.2017'
GROUP BY PERSON, TYPE
) Y ON X.PERSON = Y.PERSON AND X.TYPE = Y.TYPE
WHERE RC>1
I suggest to use always appropriate conversion for date datatypes.
Another method would be:
SELECT a.id
FROM tablename a NATURAL JOIN
(SELECT person,type FROM tablename
WHERE date>='01.01.2017' AND date<='06.06.2017'
GROUP BY person, type HAVING COUNT(*)>1) b ;
The NATURAL JOIN would automatically use columns person and type.
Add "DISTINCT" clause to avoid redundancy
SELECT DISTINCT ID FROM TABLE
WHERE DATE>='01.01.2017' AND DATE<='06.06.2017'

Aggregate column text where dates in table a are between dates in table b

Sample data
CREATE TEMP TABLE a AS
SELECT id, adate::date, name
FROM ( VALUES
(1,'1/1/1900','test'),
(1,'3/1/1900','testing'),
(1,'4/1/1900','testinganother'),
(1,'6/1/1900','superbtest'),
(2,'1/1/1900','thebesttest'),
(2,'3/1/1900','suchtest'),
(2,'4/1/1900','test2'),
(2,'6/1/1900','test3'),
(2,'7/1/1900','test4')
) AS t(id,adate,name);
CREATE TEMP TABLE b AS
SELECT id, bdate::date, score
FROM ( VALUES
(1,'12/31/1899', 7 ),
(1,'4/1/1900' , 45),
(2,'12/31/1899', 19),
(2,'5/1/1900' , 29),
(2,'8/1/1900' , 14)
) AS t(id,bdate,score);
What I want
What I need to do is aggregate column text from table a where the id matches table b and the date from table a is between the two closest dates from table b. Desired output:
id date score textagg
1 12/31/1899 7 test, testing
1 4/1/1900 45 testinganother, superbtest
2 12/31/1899 19 thebesttest, suchtest, test2
2 5/1/1900 29 test3, test4
2 8/1/1900 14
My thoughts are to do something like this:
create table date_join
select a.id, string_agg(a.text, ','), b.*
from tablea a
left join tableb b
on a.id = b.id
*having a.date between b.date and b.date*;
but I am really struggling with the last line, figuring out how to aggregate only where the date in table b is between the closest two dates in table b. Any guidance is much appreciated.
I can't promise it's the best way to do it, but this is a way to do it.
with b_values as (
select
id, date as from_date, score,
lead (date, 1, '3000-01-01')
over (partition by id order by date) - 1 as thru_date
from b
)
select
bv.id, bv.from_date, bv.score,
string_agg (a.text, ',')
from
b_values as bv
left join a on
a.id = bv.id and
a.date between bv.from_date and bv.thru_date
group by
bv.id, bv.from_date, bv.score
order by
bv.id, bv.from_date
I'm presupposing you will never have a date in your table greater than 12/31/2999, so if you're still running this query after that date, please accept my apologies.
Here is the output I got when I ran this:
id from_date score string_agg
1 0 7 test,testing
1 92 45 testinganother,superbtest
2 0 19 thebesttest,suchtest,test2
2 122 29 test3,test4
2 214 14
I might also note that between in a join is a performance killer. IF you have large data volumes, there might be better ideas on how to approach this, but that depends largely on what your actual data looks like.

How to write a LEFT JOIN in BigQuery's Standard SQL?

We have a query that works in BigQuery's Legacy SQL. How do we write it in Standard SQL so it works?
SELECT Hour, Average, L.Key AS Key FROM
(SELECT 1 AS Key, *
FROM test.table_L AS L)
LEFT JOIN
(SELECT 1 AS Key, Avg(Total) AS Average
FROM test.table_R) AS R
ON L.Key = R.Key ORDER BY Hour ASC
Currently the error it gives is:
Equality is not defined for arguments of type ARRAY<INT64> at [4:74]
BigQuery has two modes for queries: Legacy SQL and Standard SQL. We have looked at the BigQuery Standard SQL documentation and also see just one SO answer on Standard SQL joins in BigQuery - but so far, it is unclear to us what the key change needed might be.
Table_L looks like this:
Row Hour
1 A
2 B
3 C
Table_R looks like this:
Row Value
1 10
2 20
3 30
Results Desired:
Row Hour Average(OfR) Key
1 A 20 1
2 B 20 1
3 C 20 1
How do we rewrite this BigQuery Legacy SQL query to work in Standard SQL?
Based on your recent update in question and comments - try below
WITH Table_L AS (
SELECT 1 AS Row, 'A' AS Hour UNION ALL
SELECT 2 AS Row, 'B' AS Hour UNION ALL
SELECT 3 AS Row, 'C' AS Hour
),
Table_R AS (
SELECT 1 AS Row, 10 AS Value UNION ALL
SELECT 2 AS Row, 20 AS Value UNION ALL
SELECT 3 AS Row, 30 AS Value
)
SELECT
Row,
Hour,
(SELECT AVG(Value) FROM Table_R) AS AverageOfR,
1 AS Key
FROM Table_L
Above is for testing
the query you should run in "production" is
SELECT
Row,
Hour,
(SELECT AVG(Value) FROM Table_R) AS AverageOfR,
1 AS Key
FROM Table_L
In case, if for some reason you are bound to JOIN, use below CROSS JOIN version
SELECT
Row,
Hour,
AverageOfR,
1 AS Key
FROM Table_L
CROSS JOIN ((SELECT AVG(Value) AS AverageOfR FROM Table_R))
or below LEFT JOIN version with Key field involved (in case if Key really important for your logic - which somehow I feel is true)
SELECT
Row,
Hour,
AverageOfR,
L.Key AS Key
FROM (SELECT 1 AS Key, Row, Hour FROM Table_L) AS L
LEFT JOIN ((SELECT 1 AS Key, AVG(Value) AS AverageOfR FROM Table_R)) AS R
ON L.Key = R.Key
Your error message suggests that key is not a column in table_L. If no, then don't include it in the query.
It looks like you simply want the average of the total from table_R. You can approach this as:
SELECT l.*, r.average
FROM test.table_L as l CROSS JOIN
(SELECT Avg(Total) as average
FROM test.table_R
) R
ORDER BY l.hour ASC;

sql server getting first value when grouping

I have a table with column a having not necessarily distinct values and column b having for each value of a a number of distinct values. I want to get a result having each value of a appearing only once and getting the first found value of b for that value of a. How do I do this in sql server 2000?
example table:
a b
1 aa
1 bb
2 zz
3 aa
3 zz
3 bb
4 bb
4 aa
Wanted result:
a b
1 aa
2 zz
3 aa
4 bb
In addition, I must add that the values in column b are all text values. I updated the example to reflect this.
Thanks
;with cte as
(
select *,
row_number() over(partition by a order by a) as rn
from yourtablename
)
select
a,b
from cte
where rn = 1
SQL does not know about ordering by table rows. You need to introduce order in the table structure (usually using an id column). That said, once you have an id column, it's rather easy:
SELECT a, b FROM test WHERE id in (SELECT MIN(id) FROM test GROUP BY a)
There might be a way to do this, using internal SQL Server functions. But this solution is portable and more easily understood by anyone who knows SQL.