select count(ID) where ID IN a or b - sql

I don't understand what I'm doing wrong. I'm trying to get a weekly COUNT of every ID that meets criteria A OR criteria B.
select CREATE_WEEK, count ( A.PK )
from TABLE1 A
where ( A.PK not in (select distinct ( B.FK )
from TABLE2 B
where B.CREATE_TIMESTAMP > '01-Jan-2013')
or A.PK in (select A.PK
from ( select A.PK, A.CREATE_TIMESTAMP as A_CRT, min ( B.CREATE_TIMESTAMP ) as FIRST_B
from TABLE1 A, TABLE2 B
where A.PK = B.FK
and A.CREATE_TIMESTAMP > '01-Jan-2013'
and B.CREATE_TIMESTAMP > '01-Jan-2013'
group by A.PK, A.CREATE_TIMESTAMP)
where A_CRT < FIRST_B) )
and A.CREATE_TIMESTAMP > '01-Jan-2013'
and CREATE_WEEK >= 2
and THIS_WEEK - CREATE_WEEK >= 1
group by CREATE_WEEK
order by CREATE_WEEK asc
**Note: PK in table1 = FK in table2, so in the first subquery, I'm checking whether the PK from table1 exists as FK in table2. Week comes from TO_CHAR (TO_DATE (TRUNC (A.CREATE_TIMESTAMP, 'IW')), 'IW')
When I take out the OR and run the query on either subquery the results are returned in 1-2 seconds. But when I try to run the combined query, the results aren't returned after 20 minutes.
I know I can run them separately and then sum them in a spreadsheet, but I'd rather just get one number.

I'm trying to get a weekly COUNT of every ID that meets criteria A OR criteria B
However your code is:
ID NOT IN (subquery A) OR ID IN (subquery B)
The NOT is at odds with your requirement.
Assuming you ID's that meet both criteria, use:
ID in (
select ... -- this is subquery A
union
select ... -- this is subquery B)

Related

Bigquery: WHERE clause using column from outside the subquery

New to Bigquery, and googling could not really point me to the solution of the problem.
I am trying to use a where clause in a subquery to filter and pick the latest row for each other row in the main query. In postgres I'd normally do it like this:
SELECT
*
FROM
table_a AS a
LEFT JOIN LATERAL
(
SELECT
score,
CONCAT( "AB", id ) AS id
FROM
table_b AS b
WHERE
id = a.company_id
and
b.date < a.date
ORDER BY
b.date DESC
LIMIT
1
) ON true
WHERE
id LIKE 'AB%'
ORDER BY
createdAt DESC
so this would essentially run the subquery against each row and pick the latest row from table B based on a given row's date from table A.
So if table A would have a row
id
date
12
2021-05-XX
and table B:
id
date
value
12
2022-01-XX
99
12
2021-02-XX
98
12
2020-03-XX
97
12
2019-04-XX
96
It would have joined only the row with 2021-02-XX to table a.
In another example, with
Table A:
id
date
15
2021-01-XX
Table B:
id
date
value
15
2022-01-XX
99
15
2021-02-XX
98
15
2020-03-XX
97
15
2019-04-XX
96
it would join only the row with date: 2020-03-XX, value: 97.
Hope that is clear, not really sure how to write this query to work
Thanks for help!
You can replace some of your correlated sub-select logic with a simple join and qualify statement.
Try the following:
SELECT *
FROM table_a a
LEFT JOIN table_b b
ON a.id = b.id
WHERE b.date < a.date
QUALIFY ROW_NUMBER() OVER (PARTITION BY b.id ORDER BY b.date desc) = 1
With your sample data it produces:
This should work for both truncated dates (YYYY-MM) as well as full dates (YYYY-MM-DD)
Something like below should work for your requirements
WITH
latest_record AS (
SELECT
a.id,
value,b.date, a.createdAt
FROM
`gcp-project-name.data-set-name.A` AS a
JOIN
`gcp-project-name.data-set-name.B` b
ON
( a.id = b.id
AND b.date < a.updatedAt )
ORDER BY
b.date DESC
LIMIT
1 )
SELECT
*
FROM
latest_record
I ran this with table A as
and table B as
and get result

Joining tables that compute values between dates

so I have the two following tables
Table A
Date num
01-16-15 10
02-20-15 12
03-20-15 13
Table B
Date Value
01-02-15 100
01-03-15 101
. .
01-17-15 102
01-18-15 103
. .
02-22-15 104
. .
03-20-15 110
And i want to create a table that have the the following output in impala
Date Value
01-17-15 102*10
01-18-15 103*10
02-22-15 104*12
. .
. .
So the idea is that we only consider dates between 01-16-15 and 02-20-15, and 02-20-15 and 03-20-15 exclusively. And use the num from the starting date of that period, say 01-16-15, and multiply it by everyday in the period, i.e. 1-16 to 2-20.
I understand it should be done by join but I am not sure how do you join in this case.
Thanks!
Hmmm. In standard SQL you can do:
select b.*,
(select a.num
from a
where a.date <= b.date
order by a.date desc
fetch first 1 row only
) * value as new_value
from b;
I don't think this meets the range conditions, but I don't understand your description of that.
I also don't know if Impala supports correlated subqueries. An alternative is probably faster on complex data:
with ab as (
select a.date, a.value as a_value, null as b_value, 'a' as which
from a
union all
select b.date, null as a_value, b_value, 'b' as which
from b
)
select date, b_value * a_real_value
from (select ab.*,
max(a_value) over (partition by a_date) as a_real_value
from (select ab.*,
max(a.date) over (order by date, which) as a_date
from ab
) ab
) ab
where which = 'b';
This works on MariaDb (MySql) and it's pretty basic so hopefully it works on impala too.
SELECT b.date, b.value * a.num
FROM tableB b, tableA a
WHERE b.date >= a.date
AND (b.date < (SELECT MIN(c.date) FROM tableA c WHERE c.date > a.date)
OR NOT EXISTS(SELECT c.date FROM tableA c WHERE c.date > a.date))
The last NOT EXISTS... was needed to include dates after the last date in table A
Update
In the revised version of the question the date in B is never larger (after) the last date in A so then the query can be written as
SELECT b.date, b.value * a.num
FROM tableB b, tableA a
WHERE b.date >= a.date
AND b.date <= (SELECT MIN(c.date) FROM tableA c WHERE c.date > a.date)

Return groupings where at least one row per group satisfies a condition SQL server

I have a table (let's call it TableA) with multiple columns, Id being the unique id one.
I'm only interested in three: A (int), B (int), C (varchar).
Initially I want to pick up the rows which share the same A and B and return at least 2 rows.
; WITH CTE AS (
SELECT tbl.A, tbl.B
FROM [dbo].[TableA] tbl
/* WHERE irrelevant filter here */
GROUP BY tbl.A, tbl.B
HAVING COUNT(1) > 1
)
From this point on, I want to return the groupings identified in this CTE where AT LEAST one row in each grouping has it's C column set to 'ThisValue'.
Use sum with case to count the records that have 'ThisValue'
; WITH CTE AS (
SELECT tbl.A, tbl.B
FROM [dbo].[TableA] tbl
/* WHERE irrelevant filter here */
GROUP BY tbl.A, tbl.B
HAVING COUNT(1) > 1 and sum(case tbl.C when 'ThisValue' then 1 else 0 end)>0
)
You can, for example, use EXISTS operator.
; WITH CTE AS (
SELECT tbl.A, tbl.B
FROM [dbo].[TableA] tbl
/* WHERE irrelevant filter here */
GROUP BY tbl.A, tbl.B
HAVING COUNT(1) > 1
)
SELECT * FROM CTE
WHERE EXISTS (SELECT 1 FROM [dbo].[TableA] tc
WHERE tc.A=CTE.A AND tc.B=CTE.B AND tc.C='ThisValue');

how to SUM two columns in different table between two date

this my query but result false where number row different , that's to say whenever tableA select 2 row and tableB select 3 result is false
select sum(tableA.value)+sum(tableB.value1) )
from tableA,tableB
where tableA.data between '2016-01-21' and '2016-03-09'
and tableB.date2 between '2016-01-21' and '2016-03-09'
You need to do the sums in subqueries before joining. A simple rule: never use commas in the from clause.
select coalesce(avalue, 0) + coalesce(bvalue, 0)
from (select sum(a.value) as avalue
from tableA a
where a.data between '2016-01-21' and '2016-03-09'
) a cross join
(select sum(b.value) as bvalue
from tableB b
where b.data between '2016-01-21' and '2016-03-09'
) b;
OK . So here's what my understanding is.
You are trying to sum up two columns from two different tables and get the sum of the summed up columns. isn't ?? Correct me if I am wrong.If this is the case then
A Simple Subquery Can Come To Your Rescue.
Select
(Select SUM(value) From tableA
where data between '2016-01-21' and '2016-03-09') +
(Select SUM(value1) From tableB
where date2 between '2016-01-21' and '2016-03-09') FinalValue

PostgreSQL: how to use NOT IN without WHERE?

I have two queries:
select * from tableA
and
select a,b from tableA
group by a,b
the first query returns 2101 rows
the second query returns 2100 rows
I want to know which row is in the first but not in the second. It should be simple with NOT IN, but I can't find the correct syntax as NOT IN should be in WHERE statement. but I don't have a WHERE statement in my case.
There are N ways to do that and one of the simplest should be to find the rows that have a count > 1 when grouped on a,b.
select a,b from tableA
group by a,b
having count(*) > 1
Here is a sample:
with tableA as
(
select * from (values
(1,1,1),
(1,1,1),
(1,2,1)
) as t(a,b,c)
)
select a, b from tableA
group by a, b
having count(*) > 1;
You can get duplicates this way:
select a,b from tableA
group by a,b having count(1) > 1