Related
I have a table that holds an INT ARRAY data type representing some features (this is done instead of having a separate boolean column for each feature). The column is called feature_ids. If a record has a specific feature, the ID of the feature will be present in the feature_ids column. The mapping of the feature_ids are for context understanding as follows:
1: Fast
2: Expensive
3: Colorfull
4: Deadly
So in other words, I would also have had 4 columns called is_fast, is_expensive, is_colorfull and is_deadly - but I don't since my real application have +100 features, and they change quite a bit.
Now back to the question: I wanna do an aggregate mode() on the records in the table returning what are the most "frequent" features to have (e.g. if it's more common to be "fast" than not etc.). I want it returned in the same format as the original feature_ids column, but where the ID of a feature is ONLY in represented, if it's more common to be there than not, within each group:
CREATE TABLE test (
id INT,
feature_ids integer[] DEFAULT '{}'::integer[],
age INT,
type character varying(255)
);
INSERT INTO test (id, age, feature_ids, type) VALUES (1, 10, '{1,2}', 'movie');
INSERT INTO test (id, age, feature_ids, type) VALUES (2, 2, '{1}', 'movie');
INSERT INTO test (id, age, feature_ids, type) VALUES (3, 9, '{1,2,4}', 'movie');
INSERT INTO test (id, age, feature_ids, type) VALUES (4, 11, '{1,2,3}', 'wine');
INSERT INTO test (id, age, feature_ids, type) VALUES (5, 12, '{1,2,4}', 'hat');
INSERT INTO test (id, age, feature_ids, type) VALUES (6, 12, '{1,2,3}', 'hat');
INSERT INTO test (id, age, feature_ids, type) VALUES (7, 8, '{1,4}', 'hat');
I wanna do a query something like this:
SELECT
type, avg(age) as avg_age, mode() within group (order by feature_ids) as most_frequent_features
from test group by "type"
The result I expect is:
type avg_age most_frequent_features
hat 10.6 [1,2,4]
movie 7.0 [1,2]
wine 11.0 [1,2,3]
I have made an example here: https://www.db-fiddle.com/f/rTP4w7264vDC5rqjef6Nai/1
I find this quite tricky. The following is a rather brute-force approach -- calculating the "mode" explicitly and then bringing in the other aggregates:
select tf.type, t.avg_age,
array_agg(feature_id) as features
from (select t.type, feature_id, count(*) as cnt,
dense_rank() over (partition by t.type order by count(*) desc) as seqnum
from test t cross join
unnest(feature_ids) feature_id
group by t.type, feature_id
) tf join
(select t.type, avg(age) as avg_age
from test t
group by t.type
) t
on tf.type = t.type
where seqnum <= 2
group by tf.type, t.avg_age;
Here is a db<>fiddle.
I need to replace non-zeros in column within select statement.
SELECT Status, Name, Car from Events;
I can do it like this:
SELECT (Replace(Status, '1', 'Ready'), Name, Car from Events;
Or using Case/Update.
But I have numbers from -5 to 10 and writing Replace or something for each case is not good idea.
How can I add comparasing with replace without updating database?
Table looks like this:
Status Name Car
0 John Porsche
1 Bill Dodge
5 Megan Ford
The standard method is to use case:
select t.*,
(case when status = 1 then 'Ready'
else 'Something else'
end) as status_string
from t;
I would instead recommend, though, that you have a status reference table:
create table statuses (
status int primary key,
name varchar(255)
);
insert into statuses (status, name)
values (0, 'UNKNOWN'),
(1, 'READY'),
. . . -- for the rest of the statuses
Then use JOIN:
select t.*, s.name
from t join
statuses s
on t.status = s.status;
SELECT IF(status =1, 'Approved', 'Pending') FROM TABLENAME
SELECT A.barName AS BarName1, B.barName AS BarName2
FROM (
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
) AS A, B
WHERE A.count = B.count
I'm trying to do a self join on this table that I created, but I'm not sure how to alias the table twice in this format (i.e. FROM AS). Unfortunately, this is a school assignment where I can't create any new tables. Anyone have experience with this syntax?
edit: For clarification I'm using PostgreSQL 8.4. The schema for the tables I'm dealing with are as follows:
Drinkers(name, addr, hobby, frequent)
Bars(name, addr, owner)
Beers(name, brewer, alcohol)
Drinks(drinkerName, drinkerAddr, beerName, rating)
Sells(barName, beerName, price, discount)
Favorites(drinkerName, drinkerAddr, barName, beerName, season)
Again, this is for a school assignment, so I'm given read-only access to the above tables.
What I'm trying to find is pairs of bars (Name1, Name2) that sell the same set of drinks. My thinking in doing the above was to try and find pairs of bars that sell the same number of drinks, then list the names and drinks side by side (BarName1, Drink1, BarName2, Drink2) to try and compare if they are indeed the same set.
You have not mentioned what RDBMS you use.
If Oracle or MS SQL, you can do something like this (I use my sample data table, but you can try it with your tables):
create table some_data (
parent_id int,
id int,
name varchar(10)
);
insert into some_data values(1, 2, 'val1');
insert into some_data values(2, 3, 'val2');
insert into some_data values(3, 4, 'val3');
with data as (
select * from some_data
)
select *
from data d1
left join data d2 on d1.parent_id = d2.id
In your case this query
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
should be placed in WITH section and referenced from main query 2 times as A and B.
It is slightly unclear what you are trying to achive. Are you looking for a list bar names, with how many times they appear in the table? If so, there are a couple ways you could do this. Firstly:
SELECT SellsA.barName AS BarName1, SellsB.count AS Count
FROM
(
SELECT DISTINCT barName
FROM Sells
) SellsA
LEFT JOIN
(
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
) AS SellsB
ON SellsA.barName = SellsB.barName
Secondly, if you are using MSSQL:
SELECT barNamr, MAX(rn) AS Count
FROM
(
SELECT barName,
ROW_NUMBER() OVER(ORDR BY barName PARTITION BY barName) as rn
FROM Sells
) CountSells
GROUP BY barName
Thirdly, you could avoid a self-join in MSSQL, by using OVER():
SELECT
barName
COUNT(*) OVER(ORDER BY barName PARTITION BY barName) AS Count
FROM Sells
Consider following relationship.
I am trying to add new row in Tenants from NewRentPayments if no Tenants's tuple found in NewRentPayments base on composite primary keys houseid and apartmentnumber.
Have a look in my query you will have better idea
insert into Tenants(houseid, apartmentnumber, leasetenantssn, leasestartdate, leaseexpirationdate, rent, lastrentpaiddate, rentoverdue)
(
select n.* from NewRentPayments as n left join Tenants as t
on
t.houseid = n.houseid
and
t.apartmentnumber = n.apartmentnumber
where
t.houseid is null
or
t.apartmentnumber is null
) as newval
(newval.houseid, newval.apartmentnumber, newval.leasetenantssn, now(), NULL, newval.rent, newval.datepaid, 'f');
It is giving error on as newval.
ERROR: syntax error at or near "as"
LINE 12: ) as newval
^
********** Error **********
ERROR: syntax error at or near "as"
SQL state: 42601
Character: 345
Note: This is not a simple insert value in one table from another table as done here. In my case I am inserting some constants/custom values too into the Tenants rows while inserting NewRentPayments tuples.
I am using Postgresql.
You need to put the values as you want to insert in a select.
Try this:
insert into Tenants (houseid, apartmentnumber, leasetenantssn, leasestartdate, leaseexpirationdate, rent, lastrentpaiddate, rentoverdue)
select newval.houseid, newval.apartmentnumber, newval.leasetenantssn, now(), null, newval.rent, newval.datepaid, 'f'
from (
select n.*
from NewRentPayments as n
left join Tenants as t on t.houseid = n.houseid
and t.apartmentnumber = n.apartmentnumber
where t.houseid is null
or t.apartmentnumber is null
) as newval;
OK I have a table like this:
ID Signal Station OwnerID
111 -120 Home 1
111 -130 Car 1
111 -135 Work 2
222 -98 Home 2
222 -95 Work 1
222 -103 Work 2
This is all for the same day. I just need the Query to return the max signal for each ID:
ID Signal Station OwnerID
111 -120 Home 1
222 -95 Work 1
I tried using MAX() and the aggregation messes up with the Station and OwnerID being different for each record. Do I need to do a JOIN?
Something like this? Join your table with itself, and exclude the rows for which a higher signal was found.
select cur.id, cur.signal, cur.station, cur.ownerid
from yourtable cur
where not exists (
select *
from yourtable high
where high.id = cur.id
and high.signal > cur.signal
)
This would list one row for each highest signal, so there might be multiple rows per id.
You are doing a group-wise maximum/minimum operation. This is a common trap: it feels like something that should be easy to do, but in SQL it aggravatingly isn't.
There are a number of approaches (both standard ANSI and vendor-specific) to this problem, most of which are sub-optimal in many situations. Some will give you multiple rows when more than one row shares the same maximum/minimum value; some won't. Some work well on tables with a small number of groups; others are more efficient for a larger number of groups with smaller rows per group.
Here's a discussion of some of the common ones (MySQL-biased but generally applicable). Personally, if I know there are no multiple maxima (or don't care about getting them) I often tend towards the null-left-self-join method, which I'll post as no-one else has yet:
SELECT reading.ID, reading.Signal, reading.Station, reading.OwnerID
FROM readings AS reading
LEFT JOIN readings AS highersignal
ON highersignal.ID=reading.ID AND highersignal.Signal>reading.Signal
WHERE highersignal.ID IS NULL;
In classic SQL-92 (not using the OLAP operations used by Quassnoi), then you can use:
SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM t
GROUP BY id) AS g
JOIN t ON g.id = t.id AND g.MaxSignal = t.Signal;
(Unchecked syntax; assumes your table is 't'.)
The sub-query in the FROM clause identifies the maximum signal value for each id; the join combines that with the corresponding data row from the main table.
NB: if there are several entries for a specific ID that all have the same signal strength and that strength is the MAX(), then you will get several output rows for that ID.
Tested against IBM Informix Dynamic Server 11.50.FC3 running on Solaris 10:
+ CREATE TEMP TABLE signal_info
(
id INTEGER NOT NULL,
signal INTEGER NOT NULL,
station CHAR(5) NOT NULL,
ownerid INTEGER NOT NULL
);
+ INSERT INTO signal_info VALUES(111, -120, 'Home', 1);
+ INSERT INTO signal_info VALUES(111, -130, 'Car' , 1);
+ INSERT INTO signal_info VALUES(111, -135, 'Work', 2);
+ INSERT INTO signal_info VALUES(222, -98 , 'Home', 2);
+ INSERT INTO signal_info VALUES(222, -95 , 'Work', 1);
+ INSERT INTO signal_info VALUES(222, -103, 'Work', 2);
+ SELECT g.ID, g.MaxSignal, t.Station, t.OwnerID
FROM (SELECT id, MAX(Signal) AS MaxSignal
FROM signal_info
GROUP BY id) AS g
JOIN signal_info AS t ON g.id = t.id AND g.MaxSignal = t.Signal;
111 -120 Home 1
222 -95 Work 1
I named the table Signal_Info for this test - but it seems to produce the right answer.
This only shows that there is at least one DBMS that supports the notation. However, I am a little surprised that MS SQL Server does not - which version are you using?
It never ceases to surprise me how often SQL questions are submitted without table names.
WITH q AS
(
SELECT c.*, ROW_NUMBER() OVER (PARTITION BY id ORDER BY signal DESC) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
This will return one row even if there are duplicates of MAX(signal) for a given ID.
Having an index on (id, signal) will greatly improve this query.
with tab(id, sig, sta, oid) as
(
select 111 as id, -120 as signal, 'Home' as station, 1 as ownerId union all
select 111, -130, 'Car', 1 union all
select 111, -135, 'Work', 2 union all
select 222, -98, 'Home', 2 union all
select 222, -95, 'Work', 1 union all
select 222, -103, 'Work', 2
) ,
tabG(id, maxS) as
(
select id, max(sig) as sig from tab group by id
)
select g.*, p.* from tabG g
cross apply ( select top(1) * from tab t where t.id=g.id order by t.sig desc ) p
We can do using self join
SELECT T1.ID,T1.Signal,T2.Station,T2.OwnerID
FROM (select ID,max(Signal) as Signal from mytable group by ID) T1
LEFT JOIN mytable T2
ON T1.ID=T2.ID and T1.Signal=T2.Signal;
Or you can also use the following query
SELECT t0.ID,t0.Signal,t0.Station,t0.OwnerID
FROM mytable t0
LEFT JOIN mytable t1 ON t0.ID=t1.ID AND t1.Signal>t0.Signal
WHERE t1.ID IS NULL;
select a.id, b.signal, a.station, a.owner from
mytable a
join
(SELECT ID, MAX(Signal) as Signal FROM mytable GROUP BY ID) b
on a.id = b.id AND a.Signal = b.Signal
SELECT * FROM StatusTable
WHERE Signal IN (
SELECT A.maxSignal FROM
(
SELECT ID, MAX(Signal) AS maxSignal
FROM StatusTable
GROUP BY ID
) AS A
);
select
id,
max_signal,
owner,
ownerId
FROM (
select * , rank() over(partition by id order by signal desc) as max_signal from table
)
where max_signal = 1;