Find IDs with only one observation in PostgreSQL - sql

I have the following table:
CREATE TABLE my_table (
the_visitor_id varchar(5) NOT NULL,
the_visitor_visit timestamp NOT NULL,
the_visitor_returning text
);
INSERT INTO my_table
VALUES ('VIS01', '2019-05-02 09:00:00','YES' ),
('VIS01', '2019-05-04 12:00:00',NULL ),
('VIS01', '2019-05-05 18:00:00',NULL ),
('VIS02', '2019-05-06 18:30:00',NULL),
('VIS02', '2019-05-15 12:00:00',NULL),
('VIS03', '2019-06-30 18:00:00','YES'),
('VIS04', '2019-06-30 18:00:00','NULL');
And I would like to filter out all visitor_id's that have only one observation (or record). In this case VIS03 and VIS04, so I must end up with VIS01 and VIS02. I tried this:
SELECT DISTINCT ON(the_visitor_id) the_visitor_id,
the_visitor_visit, the_visitor_returning
FROM my_table
The expected result should be:
the_visitor_id the_visitor_visit the_visitor_returning
VIS01 2019-05-02 09:00:00 YES
VIS01 2019-05-04 12:00:00
VIS01 2019-05-05 18:00:00
VIS02 2019-05-06 18:30:00
VIS02 2019-05-15 12:00:00
But I guess that something like a rank is needed. Any help will be greatly appreciated.

There are probably other ways of doing this, but it you create a derived table CTE of only the visitor_ids that have more than 1 row, then use that in the join to your table. Obviously, if my_table is large an index would enhance the performance.
WITH cte
AS (
SELECT the_visitor_id
FROM my_table
GROUP BY the_visitor_id
HAVING count(*) > 1
)
SELECT my_table.*
FROM my_table
INNER JOIN cte ON cte.the_visitor_id = my_table.the_visitor_id

EXISTS can use an index:
SELECT the_visitor_id, the_visitor_visit, the_visitor_returning
FROM my_table t1
WHERE EXISTS (
SELECT FROM my_table
WHERE the_visitor_id = t1.the_visitor_id
AND ctid <> t1.ctid
);
Using ctid because you didn't disclose the PK or any UNIQUE column of the table. About ctid:
Postgresql group by for multiple lines
Ideally, you would have a UNIQUE index on (the_visitor_id, any_notnull_column) and use that column in the query. Substantially faster than a full sequential scan, count, join (another seq or idx scan).
Barring any usable index, using a window function allows us to at least keep it to a single sequential scan:
SELECT the_visitor_id, the_visitor_visit, the_visitor_returning
FROM (
SELECT *, count(*) OVER (PARTITION BY the_visitor_id) AS ct
FROM my_table
) sub
WHERE ct > 1;
db<>fiddle here

Related

How to join static table once and use it in building complex queries

--table 1
CREATE TABLE test1 (
e_id NUMBER(10),
test_col1 NUMBER(10)
);
INSERT INTO test1 VALUES(1,62);
--table 2
CREATE TABLE test2 (
e_id NUMBER(10),
test_col2 NUMBER(10)
);
INSERT INTO test2 VALUES(1,63);
--Static table
CREATE TABLE lookup_table (
l_id NUMBER(10),
l_value VARCHAR2(30)
);
INSERT INTO lookup_table VALUES(62,'value_1');
INSERT INTO lookup_table VALUES(63,'value_2');
DB version: Oracle 18c
I want to create a view based on table 1, table 2 and static table (lookup/reference table).
Basically I need to pull all the EUCs which are there in table1 along with the two additional columns which is lookup_value1 and lookup_value2. I tried joining the two tables and then joining static table to fetch the l_value from lookup table based on the ids present in table1 and table2.
My attempt:
SELECT t1.e_id,
lt.l_value AS lookup_value1,
lt1.l_value AS lookup_value2
FROM test1 t1
LEFT JOIN test2 t2 ON(t1.e_id = t2.e_id)
LEFT JOIN lookup_table lt ON(lt.l_id = t1.test_col1)
LEFT JOIN lookup_table lt1 ON(lt1.l_id = t2.test_col2);
This is giving me the expected result but here the problem is I need to join lookup_tableevery time I need to fetch the value from this table. In my case, I have joined lookup_table twice. Is there any way to join this table only once and fetch the required value from the lookup table instead of joining it again and again which will lead to a performance degradation issue
Based on my experience, there were two ways to resolve this problem.
Use trigger to add one record into the lookup_table. But need to handle l_value filed's value that need to be provided.
Don't use lookup_table, add one column(l_value filed's value) into test1 & test2 table in order to save those static data.
If you are not going to have duplicate e_id rows the you could use UNION ALL and then join once and PIVOT:
SELECT e_id, l_value1, l_value2
FROM (
SELECT t.e_id, t.type, l.l_value
FROM ( SELECT e_id, 1 AS type, test_col1 AS test_col FROM test1
UNION ALL
SELECT e_id, 2, test_col2 FROM test2 ) t
LEFT OUTER JOIN lookup_table l
ON (t.test_col = l.l_id)
)
PIVOT ( MAX(l_value) FOR type IN (1 AS l_value1, 2 AS l_value2) )
Which, for the sample data, outputs:
E_ID
L_VALUE1
L_VALUE2
1
value_1
value_2
Or, the same query using sub-query factoring clauses:
WITH complex_query1 (e_id, test_col1) AS (
SELECT * FROM test1
),
complex_query2 (e_id, test_col2) AS (
SELECT * FROM test2
),
combined_query (e_id, type, test_col) AS (
SELECT e_id, 1, test_col1 FROM complex_query1
UNION ALL
SELECT e_id, 2, test_col2 FROM complex_query2
),
lookup_values (e_id, type, l_value) AS (
SELECT t.e_id, t.type, l.l_value
FROM combined_query t
LEFT OUTER JOIN lookup_table l
ON (t.test_col = l.l_id)
)
SELECT e_id, l_value1, l_value2
FROM lookup_values
PIVOT ( MAX(l_value) FOR type IN (1 AS l_value1, 2 AS l_value2) )
db<>fiddle here

Compare a single-column row-set with another single-column row set in Oracle SQL

Is there any Oracle SQL operator or function, which compares 2 result sets whether they are the exact same or not. Currently my idea is to use MINUS operator in both directions, but I am looking for a better and performanter solution to achieve. The one result set is fixed (see below), the other depends on the records.
Very important: I am not allowed to change the schema and structure. So CREATE TABLE and CREATE TYPE etc. are not allowed here for me. Also important that oracle11g version is used where the solution must be found.
The shema for SQL Fiddle is:
CREATE TABLE DETAILS (ID INT, MAIN_ID INT, VALUE INT);
INSERT INTO DETAILS VALUES (1,1,1);
INSERT INTO DETAILS VALUES (2,1,2);
INSERT INTO DETAILS VALUES (3,1,3);
INSERT INTO DETAILS VALUES (4,1,4);
INSERT INTO DETAILS VALUES (5,2,1);
INSERT INTO DETAILS VALUES (6,2,2);
INSERT INTO DETAILS VALUES (7,3,1);
INSERT INTO DETAILS VALUES (7,3,2);
Now this is my SQL query for doing the job well (selects MAIN_IDs of those, whose 'VALUE's are exactly the same as the given lists'):
SELECT DISTINCT D.MAIN_ID FROM DETAILS D WHERE NOT EXISTS
(SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID
MINUS
SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2)))
AND NOT EXISTS
(SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2))
MINUS
SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID)
The SQL Fiddle link: http://sqlfiddle.com/#!4/25dde/7/0
If you use a collection (rather than a VARRAY) then you can aggregate the values into a collection and directly compare two collections:
CREATE TYPE int_list AS TABLE OF INT;
Then:
SELECT main_id
FROM details
GROUP BY main_id
HAVING CAST( COLLECT( value ) AS int_list ) = int_list( 1, 2 );
Outputs:
| MAIN_ID |
| ------: |
| 2 |
| 3 |
db<>fiddle here
Update
Based on your expanded fiddle in comments, you can use:
SELECT B.ID
FROM BUSINESS_DATA B
INNER JOIN BUSINESS_NAME N
ON ( B.NAME_ID=N.ID )
WHERE N.NAME='B1'
AND EXISTS (
SELECT business_id
FROM ORDERS O
LEFT OUTER JOIN TABLE(
SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' )
) d
ON ( o.orderdate = d.COLUMN_VALUE )
WHERE O.BUSINESS_ID=B.ID
GROUP BY business_id
HAVING COUNT( CASE WHEN d.COLUMN_VALUE IS NULL THEN 1 END ) = 0
AND COUNT( DISTINCT o.orderdate )
= ( SELECT COUNT(DISTINCT COLUMN_VALUE) FROM TABLE( SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' ) ) )
)
(Note: Do not implicitly create dates from strings; it will cause the query to fail, without there being any changes to the query text, if a user changes their NLS_DATE_FORMAT session parameter. Instead use TO_DATE with an appropriate format model or a DATE literal.)
db<>fiddle here

Firebird select from table distinct one field

The question I asked yesterday was simplified but I realize that I have to report the whole story.
I have to extract the data of 4 from 4 different tables into a Firebird 2.5 database and the following query works:
SELECT
PRODUZIONE_T t.CODPRODUZIONE,
PRODUZIONE_T.NUMEROCOMMESSA as numeroco,
ANGCLIENTIFORNITORI.RAGIONESOCIALE1,
PRODUZIONE_T.DATACONSEGNA,
PRODUZIONE_T.REVISIONE,
ANGUTENTI.NOMINATIVO,
ORDINI.T_DATA,
FROM PRODUZIONE_T
LEFT OUTER JOIN ORDINI_T ON PRODUZIONE_T.CODORDINE=ORDINI_T.CODORDINE
INNER JOIN ANGCLIENTIFORNITORI ON ANGCLIENTIFORNITORI.CODCLIFOR=ORDINI_T.CODCLIFOR
LEFT OUTER JOIN ANGUTENTI ON ANGUTENTI.IDUTENTE = PRODUZIONE_T.RESPONSABILEUC
ORDER BY right(numeroco,2) DESC, left(numeroco,3) desc
rows 1 to 500;
However the query returns me double (or more) due to the REVISIONE column.
How do I select only the rows of a single NUMEROCOMMESSA with the maximum REVISIONE value?
This should work:
select COD, ORDER, S.DATE, REVISION
FROM TAB1
JOIN
(
select ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By ORDER
) m on m.ORDER = TAB1.ORDER and m.REVISION = TAB1.REVISION
Here you go - http://sqlfiddle.com/#!6/ce7cf/4
Sample Data (as u set it in your original question):
create table TAB1 (
cod integer primary key,
n_order varchar(10) not null,
s_date date not null,
revision integer not null );
alter table tab1 add constraint UQ1 unique (n_order,revision);
insert into TAB1 values ( 1, '001/18', '2018-02-01', 0 );
insert into TAB1 values ( 2, '002/18', '2018-01-31', 0 );
insert into TAB1 values ( 3, '002/18', '2018-01-30', 1 );
The query:
select *
from tab1 d
join ( select n_ORDER, MAX(REVISION) as REVISION
FROM TAB1
Group By n_ORDER ) m
on m.n_ORDER = d.n_ORDER and m.REVISION = d.REVISION
Suggestions:
Google and read the classic book: "Understanding SQL" by Martin Gruber
Read Firebird SQL reference: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25.html
Here is yet one more solution using Windowed Functions introduced in Firebird 3 - http://sqlfiddle.com/#!6/ce7cf/13
I do not have Firebird 3 at hand, so can not actually check if there would not be some sudden incompatibility, do it at home :-D
SELECT * FROM
(
SELECT
TAB1.*,
ROW_NUMBER() OVER (
PARTITION BY n_order
ORDER BY revision DESC
) AS rank
FROM TAB1
) d
WHERE rank = 1
Read documentation
https://community.modeanalytics.com/sql/tutorial/sql-window-functions/
https://www.firebirdsql.org/file/documentation/release_notes/html/en/3_0/rnfb30-dml-windowfuncs.html
Which of the three (including Gordon's one) solution would be faster depends upon specific database - the real data, the existing indexes, the selectivity of indexes.
While window functions can make the join-less query, I am not sure it would be faster on real data, as it maybe can just ignore indexes on order+revision cortege and do the full-scan instead, before rank=1 condition applied. While the first solution would most probably use indexes to get maximums without actually reading every row in the table.
The Firebird-support mailing list suggested a way to break out of the loop, to only use a single query: The trick is using both windows functions and CTE (common table expression): http://sqlfiddle.com/#!18/ce7cf/2
WITH TMP AS (
SELECT
*,
MAX(revision) OVER (
PARTITION BY n_order
) as max_REV
FROM TAB1
)
SELECT * FROM TMP
WHERE revision = max_REV
If you want the max revision number in Firebird:
select t.*
from tab1 t
where t.revision = (select max(t2.revision) from tab1 t2 where t2.order = t.order);
For performance, you want an index on tab1(order, revision). With such an index, performance should be competitive with any other approach.

Update date range in Postgres table

I have table with dates:
select id,date date_ranges where range_id = 1;
1 2016-04-12
2 2016-04-13
3 2016-04-14
also i have an array:
example:
array('2016-04-11','2016-04-12','2016-04-13','2016-04-14','2016-04-15')
or
array('2016-04-13','2016-04-14','2016-04-15')
How can i insert new values from array to my table without changing existing table values?
And if i have second array, how can i delete value 2016-04-12 from table?
Help plz, I need one query)
WITH current_values AS (
SELECT generate_series('2016-04-13'::DATE, '2016-04-17'::DATE, '1 day')::DATE AS date
),
deleted_values AS (
DELETE FROM date_ranges WHERE date NOT IN (SELECT * FROM current_values) RETURNING id
)
INSERT INTO date_ranges ("date", range_id)
WITH new_values AS (
SELECT new."date"
FROM current_values AS new
LEFT JOIN date_ranges AS old
ON old."date" = new."date"
WHERE old.id IS NULL
)
SELECT date, 1 FROM new_values;

Simple Query to Grab Max Value for each ID

OK I have a table like this:
ID Signal Station OwnerID
111 -120 Home 1
111 -130 Car 1
111 -135 Work 2
222 -98 Home 2
222 -95 Work 1
222 -103 Work 2
This is all for the same day. I just need the Query to return the max signal for each ID:
ID Signal Station OwnerID
111 -120 Home 1
222 -95 Work 1
I tried using MAX() and the aggregation messes up with the Station and OwnerID being different for each record. Do I need to do a JOIN?
Something like this? Join your table with itself, and exclude the rows for which a higher signal was found.
select cur.id, cur.signal, cur.station, cur.ownerid
from yourtable cur
where not exists (
select *
from yourtable high
where high.id = cur.id
and high.signal > cur.signal
)
This would list one row for each highest signal, so there might be multiple rows per id.
You are doing a group-wise maximum/minimum operation. This is a common trap: it feels like something that should be easy to do, but in SQL it aggravatingly isn't.
There are a number of approaches (both standard ANSI and vendor-specific) to this problem, most of which are sub-optimal in many situations. Some will give you multiple rows when more than one row shares the same maximum/minimum value; some won't. Some work well on tables with a small number of groups; others are more efficient for a larger number of groups with smaller rows per group.
Here's a discussion of some of the common ones (MySQL-biased but generally applicable). Personally, if I know there are no multiple maxima (or don't care about getting them) I often tend towards the null-left-self-join method, which I'll post as no-one else has yet:
SELECT reading.ID, reading.Signal, reading.Station, reading.OwnerID
FROM readings AS reading
LEFT JOIN readings AS highersignal
ON highersignal.ID=reading.ID AND highersignal.Signal>reading.Signal
WHERE highersignal.ID IS NULL;
In classic SQL-92 (not using the OLAP operations used by Quassnoi), then you can use:
SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM t
GROUP BY id) AS g
JOIN t ON g.id = t.id AND g.MaxSignal = t.Signal;
(Unchecked syntax; assumes your table is 't'.)
The sub-query in the FROM clause identifies the maximum signal value for each id; the join combines that with the corresponding data row from the main table.
NB: if there are several entries for a specific ID that all have the same signal strength and that strength is the MAX(), then you will get several output rows for that ID.
Tested against IBM Informix Dynamic Server 11.50.FC3 running on Solaris 10:
+ CREATE TEMP TABLE signal_info
(
id INTEGER NOT NULL,
signal INTEGER NOT NULL,
station CHAR(5) NOT NULL,
ownerid INTEGER NOT NULL
);
+ INSERT INTO signal_info VALUES(111, -120, 'Home', 1);
+ INSERT INTO signal_info VALUES(111, -130, 'Car' , 1);
+ INSERT INTO signal_info VALUES(111, -135, 'Work', 2);
+ INSERT INTO signal_info VALUES(222, -98 , 'Home', 2);
+ INSERT INTO signal_info VALUES(222, -95 , 'Work', 1);
+ INSERT INTO signal_info VALUES(222, -103, 'Work', 2);
+ SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM signal_info
GROUP BY id) AS g
JOIN signal_info AS t ON g.id = t.id AND g.MaxSignal = t.Signal;
111 -120 Home 1
222 -95 Work 1
I named the table Signal_Info for this test - but it seems to produce the right answer.
This only shows that there is at least one DBMS that supports the notation. However, I am a little surprised that MS SQL Server does not - which version are you using?
It never ceases to surprise me how often SQL questions are submitted without table names.
WITH q AS
(
SELECT c.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY signal DESC) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
This will return one row even if there are duplicates of MAX(signal) for a given ID.
Having an index on (id, signal) will greatly improve this query.
with tab(id, sig, sta, oid) as
(
select 111 as id, -120 as signal, 'Home' as station, 1 as ownerId union all
select 111, -130, 'Car', 1 union all
select 111, -135, 'Work', 2 union all
select 222, -98, 'Home', 2 union all
select 222, -95, 'Work', 1 union all
select 222, -103, 'Work', 2
) ,
tabG(id, maxS) as
(
select id, max(sig) as sig from tab group by id
)
select g.*, p.* from tabG g
cross apply ( select top(1) * from tab t where t.id=g.id order by t.sig desc ) p
We can do using self join
SELECT T1.ID,T1.Signal,T2.Station,T2.OwnerID
FROM (select ID,max(Signal) as Signal from mytable group by ID) T1
LEFT JOIN mytable T2
ON T1.ID=T2.ID and T1.Signal=T2.Signal;
Or you can also use the following query
SELECT t0.ID,t0.Signal,t0.Station,t0.OwnerID
FROM mytable t0
LEFT JOIN mytable t1 ON t0.ID=t1.ID AND t1.Signal>t0.Signal
WHERE t1.ID IS NULL;
select a.id, b.signal, a.station, a.owner from
mytable a
join
(SELECT ID, MAX(Signal) as Signal FROM mytable GROUP BY ID) b
on a.id = b.id AND a.Signal = b.Signal
SELECT * FROM StatusTable
WHERE Signal IN (
SELECT A.maxSignal FROM
(
SELECT ID, MAX(Signal) AS maxSignal
FROM StatusTable
GROUP BY ID
) AS A
);
select
id,
max_signal,
owner,
ownerId
FROM (
select * , rank() over(partition by id order by signal desc) as max_signal from table
)
where max_signal = 1;