selecting more than one row of matching data - sql

I need to select the number (cid) of a customer that has rented the same movie from 2 different branches. My tables are as follows:
RENTED
(cid, copyid)
12345 99999
12345 88888
COPY
(copyid, mid, bid)
99999 444 123
88888 444 456
So one customer (12345) has rented the same move (444) from two different branches (123, 456). I am not sure how to compare the values where in two different records, the values mid = mid but bid != bid. I tried to use 'some' and 'all' but this gives me no rows (code below)
select cid
from rented R join copy CP on R.copyid = CP.copyid
where CP.mid = all (select mid from copy where CP.mid = copy.mid) and CP.bid != some (select bid
from copy where CP.bid = copy.bid);
and my output should be
cid
12345

you could use the HAVING clause. The following query will list all customers who have ever rented the same movie several times:
SELECT r.cid
FROM rented r
JOIN copy p ON r.copyid = p.copyid
GROUP BY r.cid, p.mid
HAVING COUNT(DISTINCT c.bid) > 1

Using a single pass on each table:
select distinct(cid) from (
select cid, count(bid) over (partition by r.cid,c.mid) dist_branch
from rented r, copy c
where r.copyid = c.copyid)
where dist_branch > 1;

Related

How can I count unique attribute values using two attributes and joining two tables?

I'm a beginner in SQL.
Simplified, I have two tables, districts and streetdistricts, which contain information about city districts and streets. Every district has a unique number dkey and every street has a unique street number stkey (as primary keys respectively).
Here's an example:
Table districts:
dkey
name
1
Inner City
2
Outer City
3
Outskirts
Table streetdistricts:
stkey
dkey
113
1
126
2
148
2
148
3
152
3
154
3
What I want to do now is to find out how many streets are there per district that are located only in one single district. So that means I do not have to just remove duplicates (like street with stkey 148 here), but instead to remove streets that are situated in more than one district completely so that I only see the districts and the number of streets per district that are just located in one district only.
For this example, this would be:
name number_of_street_in_just_this_district
Inner City 1
Outer City 1
Outskirts 2
I've tried many things, but I always get stuck, mostly because when I SELECT the name of the district, it is also needed in GROUP BY as SQL says, but when I add it, then either the whole number of streets (here: 6) or at least the number including the duplicates (here: 5) are displayed, but not the right answer of 3.
Or I'm not able to JOIN the tables correctly so to get the output I want. Here is my last try:
SELECT SUM(StreetDistricts.dkey) as d_number, StreetDistricts.stkey, COUNT(StreetDistricts.stkey) as numb
FROM StreetDistricts
INNER JOIN Districts ON Districts.dkey = StreetDistricts.dkey
GROUP BY StreetDistricts.stkey
HAVING COUNT(StreetDistricts.dkey) = 1
ORDER BY d_number DESC
This works to get me the correct sum of rows, but I was not able to combine/join it with the other table to receive name and number of unique streets.
First obtain the streets that are found in only one district (cte1). Then count just those streets per district. This should do it:
WITH cte1 AS (
SELECT stkey FROM StreetDistricts GROUP BY stkey HAVING COUNT(DISTINCT dkey) = 1
)
SELECT d.name, COUNT(*) AS n
FROM StreetDistricts AS s
JOIN Districts AS d
ON s.dkey = d.dkey
AND s.stkey IN (SELECT stkey FROM cte1)
GROUP BY d.dkey
;
Result:
+------------+---+
| name | n |
+------------+---+
| Inner City | 1 |
| Outer City | 1 |
| Outskirts | 2 |
+------------+---+
Note: I used the fact that dkey is the primary key of Districts to avoid having to GROUP BY d.name as well. This is guaranteed by functional dependence. If your database doesn't guarantee that with a constraint, just add d.name to the final GROUP BY terms.
The test case:
CREATE TABLE Districts (dkey int primary key, name varchar(30));
CREATE TABLE StreetDistricts (stkey int, dkey int);
INSERT INTO Districts VALUES
(1,'Inner City')
, (2,'Outer City')
, (3,'Outskirts')
;
INSERT INTO StreetDistricts VALUES
(113,1)
, (126,2)
, (148,2)
, (148,3)
, (152,3)
, (154,3)
;

Delete records with duplicates and join in another table

I need to write a query (Microsoft SQL Server) to delete duplicates in the table Vehicle that have Vehicle.CarId = Car.CarId and having the same concatenation (CarId, CounterLimit, Kilometers).
Table Car:
CarId
-----
11111
Table Vehicle:
VehicleId CarId CounterLimit Kilometers
-----------------------------------------------------
1 11111 250 120000
2 23456 300 150000
3 11111 250 120000 (record duplicated with 1, should be deleted)
Could you please help me?
Delete rows with lesser VehicleId
delete v
from Vehicle v
where exists (
select 1
from Vehicle v2
where v2.VehicleId > v.VehicleId
and v2.CarId = v.CarId and v2.CounterLimit = v.CounterLimit and v2.Kilometers = v.Kilometers)
To just query the table
select max(vehicleid) vehicleid, carid, CounterLimit, Kilometers
from Vehicle
group by carid, CounterLimit, Kilometers
Joining the table
creating the rank based on carid,counter limit, kilometer. If all three are same it is considered as duplicate. If you need to add more or less number of columns in this criteria you can adjust this part
next we take just one of the above row , meaning we eliminate the duplicates using rank_1 = 1
with rank as (
select
vehicle.vehicleid,
vehicle.carid,
vehicle.CounterLimit,
vehicle.Kilometers,
row_number() over(partition by vehicle.carid,vehicle.CounterLimit, vehicle.Kilometers order by vehicle.vehicleid) as rank_
from a vehicle
left join car
on Vehicle.CarId =car.carid
)
select * from rank where rank_ = 1

Finding lowest Ids of duplicates and updating tables according to these Ids

The problem
I have a sql database with a table for Hashtags, of which many are duplicates with regard to their names.
A statement like
SELECT *
FROM HashTag
ORDER BY Name
returns something like
Id | Name
1947 | test
1950 | sample
1962 | test
1963 | sample
1986 | test
2014 | example
I want to keep only the hashtag with the lowest Id for each Name (1947 for 'test' and 1950 for 'sample') and update other tables with this Id, replacing the higher Ids (example: updating hashtag 'test'; lowest Id = 1947, higher Ids = 1962, 1986). These sql statements are updated manually as of now and would be as follows:
UPDATE HashTaggedActivity
SET [HashTag_id] = 1947
WHERE HashTag_id in (1962, 1986)
Update HashTaggedGroup
SET [HashTag_id] = 1947
WHERE HashTag_id in (1962, 1986)
DELETE ht
FROM HashTag ht
WHERE ht.Id in (1962, 1986)
After this I have to do this for HashTag 'sample', which is an error prone and tedious process. The HashTag 'example' is not a duplicate and should not result in updating other tables.
Is there any way to write an sql statement for doing this for each occurence of duplicate names in the table HashTag?
What I've tried so far
I think I have to combine a statement for getting a duplicate count ordered by Id
select ht.Id, ht.Name, htc.dupeCount
from HashTag ht
inner join (
SELECT ht.Name, COUNT(*) AS dupeCount
FROM HashTag ht
GROUP BY ht.Name
HAVING COUNT(*) > 1
) htc on ht.Name = htc.Name
ORDER BY Id
which gives
Id | Name | dupeCount
1947 | test | 3
1950 | sample | 2
1962 | test | 3
1963 | sample | 2
1986 | test | 3
2014 | example | 1
with my UPDATE and DELETE statements according to the dupeCount, but I'm not sure how to do this ;-)
Thanks in advance and best regards,
Michael
The first two update statements first get the name based on the hashtag_id (innermost select), then get the minimum of all ids in hashtag that share the same name (next select) to then update the hashtag_id accordingly.
In this case, it will also update the records with hashtag_id 1947 and 1950 - but the new value will be identical to the old value.
update HashTaggedGroup
set hashtag_id =
(select min(id)
from hashtag h1
where (
select name
from hashtag h2
where h2.id=HashTaggedGroup.hashtag_id)=h1.name);
update HashTaggedActivity
set hashtag_id =
(select min(id)
from hashtag h1
where (
select name
from hashtag h2
where h2.id=HashTaggedActivity.hashtag_id)=h1.name);
The delete as below will work for Mysql and SQLServer, it may need adjustment for other DBs (the idea remains the same though). If you are certain that all ids from hashtag are present in HashTaggedActivity, that would make it possible to have the query simpler.
delete h1 from hashtag as h1
inner join hashtag as h2 on
h1.name = h2.name and
h1.id > h2.id;
SQLFiddle for the above
I would use window functions:
with ht as (
select ht.*, min(id) over (partition by name) as minid
from hashtag ht
)
update hta
set hashtag_id = ht.minid
from HashTaggedActivity hta join
ht
on hta.hashtag_id = ht.id
where ht.minid <> hta.hashtag_id;
And then do the delete in a similar way:
with ht as (
select ht.*, min(id) over (partition by name) as minid
from hashtag ht
)
delete from ht
where ht.minid <> id;

Invalid count and sum in cross tab query using PostgreSQL

I am using PostgreSQL 9.3 version database.
I have a situation where I want to count the number of products sales and sum the amount of product and also want to show the cities in a column where the product have sale.
Example
Setup
create table products (
name varchar(20),
price integer,
city varchar(20)
);
insert into products values
('P1',1200,'London'),
('P1',100,'Melborun'),
('P1',1400,'Moscow'),
('P2',1560,'Munich'),
('P2',2300,'Shunghai'),
('P2',3000,'Dubai');
Crosstab query:
select * from crosstab (
'select name,count(*),sum(price),city,count(city)
from products
group by name,city
order by name,city
'
,
'select distinct city from products order by 1'
)
as tb (
name varchar(20),TotalSales bigint,TotalAmount bigint,London bigint,Melborun bigint,Moscow bigint,Munich bigint,Shunghai bigint,Dubai bigint
);
Output
name totalsales totalamount london melborun moscow munich shunghai dubai
---------------------------------------------------------------------------------------------------------
P1 1 1200 1 1 1
P2 1 3000 1 1 1
Expected Output:
name totalsales totalamount london melborun moscow munich shunghai dubai
---------------------------------------------------------------------------------------------------------
P1 3 2700 1 1 1
P2 3 6860 1 1 1
Your first mistake seems to be simple. According to the 2nd parameter of the crosstab() function, 'Dubai' must come as first city (sorted by city). Details:
PostgreSQL Crosstab Query
The unexpected values for totalsales and totalamount represent values from the first row for each name group. "Extra" columns are treated like that. Details:
Pivot on Multiple Columns using Tablefunc
To get sums per name, run window functions over your aggregate functions. Details:
Get the distinct sum of a joined table column
select * from crosstab (
'select name
,sum(count(*)) OVER (PARTITION BY name)
,sum(sum(price)) OVER (PARTITION BY name)
,city
,count(city)
from products
group by name,city
order by name,city
'
-- ,'select distinct city from products order by 1' -- replaced
,$$SELECT unnest('{Dubai,London,Melborun
,Moscow,Munich,Shunghai}'::varchar[])$$
) AS tb (
name varchar(20), TotalSales bigint, TotalAmount bigint
,Dubai bigint
,London bigint
,Melborun bigint
,Moscow bigint
,Munich bigint
,Shunghai bigint
);
Better yet, provide a static set as 2nd parameter. Output columns are hard coded, it may be unreliable to generate data columns dynamically. If you a another row with a new city, this would break.
This way you can also order your columns as you like. Just keep output columns and 2nd parameter in sync.
Honestly I think your database needs some drastic normalization and your results in several columns (one for each city name) is not something I would do myself.
Nevertheless if you want to stick to it you can do it this way.
For the first step you need get the correct amounts. This would do the trick quite fast:
select name, count(1) totalsales, sum(price) totalAmount
from products
group by name;
This will be your result:
NAME TOTALSALES TOTALAMOUNT
P2 3 6860
P1 3 2700
You would get the Products/City this way:
select name, city, count(1) totalCityName
from products
group by name, city
order by name, city;
This result:
NAME CITY TOTALCITYNAME
P1 London 1
P1 Melborun 1
P1 Moscow 1
P2 Dubai 1
P2 Munich 1
P2 Shunghai 1
If you really would like a column per city you could do something like:
select name,
count(1) totalsales,
sum(price) totalAmount,
(select count(1)
from Products a
where a.City = 'London' and a.name = p.name) London,
...
from products p
group by name;
But I would not recommend it!!!
This would be the result:
NAME TOTALSALES TOTALAMOUNT LONDON ...
P1 3 2700 1
P2 3 6860 0
Demonstration here.

Selecting a row based on column value

I have a select statement in SQL. The select statement is selecting a licenseNo and a LicenseID. Basically, I want it to return the LicenseNo depending on which LicenseTypeID it is.
For example, I want it to return the LicenseNo if the LicenseTypeID = 6 first, then if there is no ID that equals 6, return the LicenseNo where the LicenseTypeID = 5 and so on.
Right now, I have a join that is causing multiple LicenseNos to be returned because there are multiple LicenseTypeIDs. I only want it to return the LicenseNo and row in which the ID of 6 takes precedence, then 5, then 4 and so on. It looks something like this right now:
Select a.Name,
a.addressNo,
b.LicenseNo,
LicenseTypeID
from addressbook a
join licenses b
on a.addressNo = b.addressNo
Returns
111 CompanyA 1234 6
111 CompanyA 2222 4
So I only want it to return the first row, and if that ID doesnt exist (6) I want it to return the second row of 4.
You need a subselect to determine the maximum licence number for each address:
select
a.name,
a.addressno,
l.licenseno,
l.licensetypeid
from addressbook a
join licenses l on l.addressno = a.addressno
where l.licenseno =
(
select max(licenseno)
from licenses
where licenses.addressno = a.addressno
);
Try this.
SELECT * FROM
(SELECT ROW_NUMBER() OVER
(PARTITION BY l.licenseno ORDER BY l.licenseno DESC) NO,
a.Name,
a.addressNo,
b.LicenseNo,
LicenseTypeID
from addressbook a
join licenses b
on a.addressNo = b.addressNo) AS t WHERE no = 1