Athena looking for records with different start dates - sql

I have a lot of customer files with I customer data that includes a customer id which can have multiple service points. A service point can have a meter and a meter can have a meter install date:
Cust
Service Point
Meter ID
Meter Install Date
1
A1
AM1
20201005
1
A1
AM1
20201005
1
A1
AM1
20201005
1
A1
AM1
20150101
1
A1
AM1
20150101
1
A1
AM1
20150101
1
A2
AM2
20220110
1
A2
AM2
20220110
1
A2
AM2
20220110
1
A2
AM21
20230215
1
A3
AM3
20200509
1
A3
AM3
20200509
1
A3
AM3
20200509
1
A3
AM3
20221013
I'm trying to find the number of meters that have a multiple install dates. It is not uncommon to have multiple rows where these field's information is duplicated. As I try different strategies I get different answers so I'm doing something wrong.
I've tried:
select customer_id, service_point_id, secondary_sp_id
from customer
where secondary_sp_id in (
select secondary_sp_id
from customer
group by secondary_sp_id
having length(secondary_sp_id) > 1 and count(distinct meter_install_date) > 1
select customer_id, service_point_id, secondary_sp_id, meter_install_date
from customer
where secondary_sp_id in (
select secondary_sp_id
from customer
group by secondary_sp_id having count(distinct meter_install_date) > 1 )
select a.service_point_id, a.secondary_sp_id, a.meter_install_date
from customer a, customer b
where a.service_point_id = b.service_point_id
and a.secondary_sp_id = b.secondary_sp_id
and a.meter_install_date != b.meter_install_date
group by a.service_point_id, a.secondary_sp_id, a.meter_install_date
I would expect to get back:
Cust
Service Point
Meter ID
Meter Install Date
1
A1
AM1
20201005
1
A1
AM1
20150101
1
A3
AM3
20200509
1
A3
AM3
20221013
I don't think I'm handling when a service point has multiple meters and one of those meters has multiple start dates. Thanks for your help!

I'm not sure we have enough information of your data or schema, such as how "secondardy_sp_id" fits into this. No details were provided on that column nor the prod_peco_customer table.
If we assume your data appears like your first formatted section in the question, then the following CTE would work as-is.
create table customer (
cust integer,
service_point varchar(5),
meter_id varchar(5),
meter_install_date date
);
insert into customer values
(1, 'A1', 'AM1', '20201005'),
(1, 'A1', 'AM1', '20150101'),
(1, 'A2', 'AM2', '20230110');
with target_meters as (
select meter_id
from customer
group by meter_id
having count(distinct meter_install_date) > 1
)
select c.*
from customer c
join target_meters t
on c.meter_id = t.meter_id;
cust
service_point
meter_id
meter_install_date
1
A1
AM1
2020-10-05T00:00:00.000Z
1
A1
AM1
2015-01-01T00:00:00.000Z
But I kinda doubt your data looks like this even though you formatted it that way in the question. Adjust accordingly, but main point is that you could use a sub-query or CTE for identifying your meters with multiple install dates.
----------Update-----------
Based on the updated sample data, then you would simply need to change select c.* to select distinct c.* such as this...
with target_meters as (
select meter_id
from customer
group by meter_id
having count(distinct meter_install_date) > 1
)
select distinct c.*
from customer c
join target_meters t
on c.meter_id = t.meter_id
order by 1,2,3,4
cust
service_point
meter_id
meter_install_date
1
A1
AM1
2015-01-01T00:00:00.000Z
1
A1
AM1
2020-10-05T00:00:00.000Z
1
A3
AM3
2020-05-09T00:00:00.000Z
1
A3
AM3
2022-10-13T00:00:00.000Z

Related

How to retrieve historical data based on condition on one row?

I have a table historical_data
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
2
2011-01-12
q
q1
2
2011-02-01
d
d1
3
2011-11-01
s
s1
I need to retrieve the whole history of an id based on the date condition on any 1 row related to that ID.
date>='2011-11-01' should get me
ID
Date
column_a
column_b
1
2011-10-01
a
a1
1
2011-11-01
w
w1
1
2011-09-01
a
a1
3
2011-11-01
s
s1
I am aware you can get this by using a CTE or a subquery like
with selected_id as (
select id from historical_data where date>='2011-11-01'
)
select hd.* from historical_data hd
inner join selected_id si on hd.id = si.id
or
select * from historical_data
where id in (select id from historical_data where date>='2011-11-01')
In both these methods I have to query/scan the table ``historical_data``` twice.
I have indexes on both id and date so it's not a problem right now, but as the table grows this may cause issues.
The table above is a sample table, the table I have is about to touch 1TB in size with upwards of 600M rows.
Is there any way to achieve this by only querying the table once? (I am using Snowflake)
Using QUALIFY:
SELECT *
FROM historical_data
QUALIFY MAX(date) OVER(PARTITION BY id) >= '2011-11-01'::DATE;

SQL Interleave multiple ordered tables

Let's say I have 2 tables with date ordered rows like:
products table:
date
name
09/01/2021
P1
12/01/2021
P2
22/01/2021
P3
and artworks table:
date
name
19/01/2018
A1
27/02/2019
A2
28/02/2021
A3
Is there any way in SQL to design a query that joins the 2 tables by "interleaving" them, but takes the first 2 products, then 1 artwork, then the next 2 products, then the next artwork...and so on
The result would be like:
date
name
09/01/2021
P1
12/01/2021
P2
19/01/2018
A1
22/01/2021
P3
27/02/2019
A2
You can use ROW_NUMBER() to produce interleaving numbering.
For example:
select
date, name
from (
select date, name,
row_number() over(order by date) * 10 as rn
from products
union all
select date, name,
row_number() over(order by date) * 20 + 1 as rn
from artworks
) x
order by rn

SQL - Redshift Lag Function getting duplicates

I have a table below
ID Type Sub_ID Date CNT
A P A1 4/1/2020 5
A P A2 4/5/2020 NULL
A P A3 4/8/2020 NULL
What I want to get is
ID Type Sub_ID Date CNT LAG
A P A1 4/1/2020 5 NULL
A P A2 4/5/2020 NULL 5
A P A3 4/8/2020 NULL NULL
I have below queries but it's giving me duplicates like
ID Type Sub_ID Date CNT LAG
A P A1 4/1/2020 5 NULL
A P A1 4/1/2020 5 5 (duplicate)
A P A2 4/5/2020 NULL 5
A P A2 4/5/2020 NULL NULL (duplicate)
A P A3 4/8/2020 NULL NULL
select *, lag(cnt,1) over (partition by id, type order by date)
from mytable
Anything wrong?
Ok...I have duplicate data in my table..Need to dedup first and then do the lag on top of the cleaned table

Select distinct rows with max date with repeated and null values (Oracle)

I've 3 tables. Let's say Root, Detail and Revision
I need to select the distinct codes from Root with the highest revision date, having count that the revision lines may not exist and/or have repeteated values in the date column.
Root: idRoot, Code
Detail: idDetail, price, idRoot
Revision: idRevision, date, idDetail
So, i've started doing the join query:
select code, price, date from Root r
inner join Detail d on d.idRoot = r.idRoot
left join Revision r on d.idDetail = r.idDetail;
Having table results like this:
CODE|PRICE|DATE idRevision
---- ----- ----- -----------
C1 100 2/1/2016 1
C1 120 2/1/2016 3
C1 150 null 2
C1 200 1/1/2016 4
C2 300 null null
C3 400 3/1/2016 6
But what I really need is the next result:
CODE|PRICE|DATE idRevision
---- ----- ----- -----------
C1 120 2/1/2016 3
C2 300 null null
C3 400 3/1/2016 6
I've seen several answers for similar cases, but never with null and repeated values:
Oracle: Taking the record with the max date
Fetch the row which has the Max value for a column
Oracle Select Max Date on Multiple records
Any kind of help would be really appreciated
You can use row_number():
select code, price, date
from (select code, price, date,
row_number() over (partition by code order by date desc nulls last, idRevision desc) as seqnum
from Root r inner join
Detail d
on d.idRoot = r.idRoot left join
Revision r
on d.idDetail = r.idDetail
) rdr
where seqnum = 1;

SQL Search for records with missing values in the same table

I have a table (t1) with multiple rows of statuses for different references, one column being a ReferenceID and another column being a StatusID.
t1.ReferenceID - t1.StatusID
A1 - 1
A1 - 2
A1 - 3
A1 - 4
A2 - 1
A2 - 3
A3 - 1
A3 - 3
A4 - 1
A4 - 4
A5 - 2
A5 - 3
I have a second table (t2) which is the list of all available StatusID's
t2.StatusID
1
2
3
4
I need to be able to pull a list of ReferenceID's from t1 where StatusID '1' exists, however it is missing one or more of the other StatusID's in table 2.
i.e. using the above the following referenceID's would be returned:
A2
A3
A4
Don;t know if this will work on SQLAnywhere.
SELECT DISTINCT r.ReferenceID
FROM (SELECT ReferenceID FROM TableName WHERE StatusID = 1 GROUP BY ReferenceID) r
CROSS JOIN (SELECT StatusID FROM TableName GROUP BY StatusID) d
LEFT JOIN TableName a
ON d.StatusID = a.StatusID AND
r.ReferenceID = a.ReferenceID
WHERE a.StatusID IS NULL
ORDER BY r.ReferenceID
SQLFiddle Demo (running in MySQL)