SQLite3 - Counting number of duplicate and non-duplicate books each user owns - sql

I'm creating a database that keeps track of books, users, and what books each user owns. A user can have several copies of a certain book, specified by their book id. What I'm trying to calculate in particular is show for each username the number of all books they own and the number of non-duplicated books they own. I have an attempt below but the numbers do not appear to be correct. For example after doing my select statement, it says that Sammy's total number of duplicated books is 4 and his total number of non-duplicated books is 3.
When you actually look at the data in the owns table, you can see the real values are that Sammy's total duplicated books is 3+2+1+1 = 7 books, and his total number of non-duplicated books would just be the total number of unique book_ids he has in his collection which is just 4.
I'm not sure what's wrong with the logic of my query and would appreciate some help.
Schema:
CREATE TABLE IF NOT EXISTS books(
id integer NOT NULL primary key UNIQUE,
title text NOT NULL UNIQUE,
genre text NOT NULL,
price integer NOT NULL,
units_available integer NOT NULL
);
CREATE TABLE IF NOT EXISTS users(
username text primary key NOT NULL UNIQUE,
password text NOT NULL
);
CREATE TABLE IF NOT EXISTS owns(
owners_username integer NOT NULL,
book_id integer NOT NULL,
quantity integer NOT NULL,
PRIMARY KEY (owners_username, book_id),
FOREIGN KEY (owners_username) REFERENCES users (username),
FOREIGN KEY (book_id) REFERENCES books (id)
);
Everything in owns table / all books and their quantities owned by each user:
select * from owns;
owners_username book_id quantity
--------------- --------- --------
Bobby 47911 1
Bobby 49286 1
Bobby 55622 1
Sammy 50818 3
Sammy 49290 2
Sammy 55617 1
Sammy 6555 1
Andrew 50546 1
Andrew 49290 4
Andrew 48401 1
When I attempt to count the number of duplicated and non-duplicated books each user owns:
select owners_username, dup_count, nodup_count
from
(select owners_username, count(quantity) as dup_count
from (select owners_username, quantity from owns)
group by owners_username)
natural join
(select owners_username, count(quantity) as nodup_count
from (select distinct owners_username, quantity from owns)
group by owners_username);
owners_username dup_count nodup_count
--------------- --------- -----------
Andrew 3 2
Bobby 3 1
Sammy 4 3

You need for each user the sum of the column quantity and the number of distinct book_ids:
SELECT owners_username,
SUM(quantity) dup_count,
COUNT(DISTINCT book_id) nodup_count
FROM owns
GROUP BY owners_username
See the demo.
Results:
owners_username
dup_count
nodup_count
Andrew
6
3
Bobby
3
3
Sammy
7
4

Related

Own id for every unique name in the table?

Is it possible to make a table that has like auto-incrementing id's for every unique name that I make in the table?
For example:
ID NAME_ID NAME
----------------------
1 1 John
2 1 John
3 1 John
4 2 Mary
5 2 Mary
6 3 Sarah
7 4 Lucas
and so on.
Use the window function rank() to get a unique id per name. Or dense_rank() to get the same without gaps:
SELECT id, dense_rank() OVER (ORDER BY name) AS name_id, name
FROM tbl;
I would advise not to write that redundant information to your table. You can generate that number on the fly. Or you shouldn't store name redundantly in that table, name would typically live in another table, with name_id as PRIMARY KEY.
Then you have a "names" table and run "SELECT or INSERT" there to get a unique name_id for every new entry in the main table. See:
Is SELECT or INSERT in a function prone to race conditions?
First add the column to the table.
ALTER TABLE yourtable
ADD [UID] INT NULL;
``
ALTER TABLE yourtable
ADD constraint fk_yourtable_uid_id foreign key ([UID]) references yourtable([Serial]);
Then you can update the UID with the minimum Serial ID per Name.
UPDATE t
SET [UID] = q.[UID]
FROM yourtable t
JOIN
(
SELECT Name, MIN([Serial]) AS [UID]
FROM yourtable
GROUP BY Name
) q ON q.Name = t.Name
WHERE (t.[UID] IS NULL OR t.[UID] != q.[UID]) -- Repeatability

SQL comparison report on cartesian product using subquery

I'm a student building a comparison report query in MySQL on a database that tracks customers, products, and purchases in separate tables. I have to create a report that shows how many products were sold every month for each province using a subquery. I was told to use a cross join between product and customer, however, my query runs into a problem when I try to group them as the records all collapse into each other and I don't understand why this is happening. I'm not sure if this is the correct way to approach this problem since my customer and product table don't have any values that intersect with each other except through the purchase table.
These are my create table scripts
CREATE TABLE 'customer' (
'CustomerID' INT NOT NULL,
'City' VARCHAR(100) NOT NULL,
'Province' CHAR(2) NOT NULL,
PRIMARY KEY ('CustomerID'));
CREATE TABLE 'product' (
'ProductID' INT NOT NULL,
'ProductName' VARCHAR(100) NOT NULL,
'Price' DECIMAL(5,2) NOT NULL,
PRIMARY KEY ('ProductID'));
CREATE TABLE 'purchase' (
'PurchaseID' INT NOT NULL,
'PurchaseDate' DATE NOT NULL,
'customer_CustomerID' INT NOT NULL,
'product_ProductID' INT NOT NULL,
PRIMARY KEY ('PurchaseID'),
CONSTRAINT 'fk_purchase_customer'
FOREIGN KEY ('customer_CustomerID')
REFERENCES 'customer' ('CustomerID'),
CONSTRAINT 'fk_purchase_product'
FOREIGN KEY ('product_ProductID')
REFERENCES 'product' ('ProductID'));
This is the query that I have written as I have understood the instructions.
SELECT DISTINCT province, productName AS Product, JanTotalSales
FROM PRODUCT cross join CUSTOMER
LEFT JOIN
(
SELECT purchaseID, product_productID, customer_customerID, COUNT(purchaseDate) AS JanTotalSales
FROM PURCHASE
WHERE MONTH(purchaseDate) = 01
)JAN ON PRODUCT.productID = JAN.product_productID
GROUP BY province, productID;
I should be getting results like this
Province
Product
JanTotalSales
FebTotalSales
...
TotalSales
QC
Paper
1
NULL
...
1
ON
Paper
1
2
...
3
AB
Paper
1
NULL
...
1
AB
Wire
2
2
...
4
ON
Wire
2
1
...
3
NULL
Kit
NULL
NULL
...
NULL
SK
Gummy
1
1
...
2
NULL
Bag
NULL
NULL
...
NULL
However, I receive results like this when I do it on the January subquery.
Province
Product
JanTotalSales
AB
Paper
NULL
AB
Wire
NULL
AB
Kit
NULL
AB
Kit
13
ON
Paper
NULL
ON
Wire
NULL
ON
Kit
NULL
ON
Kit
13
I appreciate whatever help you can give to show me where I'm going wrong. From what I understand it's something to do with how the grouping occurs but I can't figure out why.

How to make a join in a hierarchical query

I have these two tables:
CREATE TABLE Category
(
curCategory VARCHAR2(50) PRIMARY KEY,
parentCategory VARCHAR(50),
CONSTRAINT check_cat CHECK (curCategory is not null),
CONSTRAINT fk_category1 FOREIGN KEY(parentCategory) REFERENCES Category(curCategory)
);
CREATE TABLE Article
(
name VARCHAR(50)
artCategory VARCHAR(50),
CONSTRAINT pk_article PRIMARY KEY(name),
CONSTRAINT fk_artCategory FOREIGN KEY(artCategory) REFERENCES Category(curCategory)
);
What I want is something like this:
Select
level, curCategory, parentCategory
from
Category
join
article on artCategory = curCategory
start with curCategory = 'Clothes'
connect by prior curCategory = parentCategory
order siblings by curCategory;
I want to print every article which is a clothes. So what I wanted to do is go through every child category including the category itself ('Clothes') and check if the article category matches the curCategory. But when I execute my query I get zero records.
Syntax looks like Oracle so - here's one option which shows what you might do.
Use VARCAHR**2** datatype, not VARCHAR. I shortened columns' length so that the output is easier to display. Also, there was a syntax error (a missing comma at the end of the article.name column declaration).
SQL> create table category (
2 curcategory varchar2(20) primary key,
3 parentcategory varchar2(20),
4 constraint check_cat check ( curcategory is not null ),
5 constraint fk_category1 foreign key ( parentcategory )
6 references category ( curcategory )
7 );
Table created.
SQL> create table article (
2 name varchar2(20),
3 artcategory varchar2(20),
4 constraint pk_article primary key ( name ),
5 constraint fk_artcategory foreign key ( artcategory )
6 references category ( curcategory )
7 );
Table created.
Some sample data:
SQL> select * from category;
CURCATEGORY PARENTCATEGORY
-------------------- --------------------
clothes
trousers clothes
shirts clothes
SQL> select * from article;
NAME ARTCATEGORY
-------------------- --------------------
t-shirt shirts
a-shirt shirts
jeans trousers
SQL>
According to that, the hierarchy should look like this:
clothes
shirts
t-shirt
a-shirt
trousers
jeans
OK, let's make it so.
Out of two tables, using the WITH factoring clause (i.e. a CTE, common table expression), I'm UNION-ing those two tables in order to create a single parent-child source. Then, writing a hierarchical query is a simple matter.
SQL> with source as
2 (select c.parentcategory parent, c.curcategory child
3 from category c
4 union
5 select a.artcategory parent, a.name child
6 from article a
7 )
8 select lpad(' ', 2 * level - 1) || child clothes
9 from source
10 start with parent is null
11 connect by prior child = parent
12 order siblings by child;
CLOTHES
------------------------------------------------------------
clothes
shirts
a-shirt
t-shirt
trousers
jeans
6 rows selected.
SQL>

Select all entries from one table which has two specific entries in another table

So, I have 2 tables defined like this:
CREATE TABLE tblPersons (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
CREATE TABLE tblHobbies (
person_id INTEGER REFERENCES tblPersons (id),
hobby TEXT
);
And for example I have 3 person added to tblPersons:
1 | John
2 | Bob
3 | Eve
And next hobbies in tblHobbies:
1 | skiing
1 | serfing
1 | hiking
1 | gunsmithing
1 | driving
2 | table tennis
2 | driving
2 | hiking
3 | reading
3 | scuba diving
And what I need, is query which will return me a list of person who have several specific hobbies.
The only thing I could've come up with, is this:
SELECT id, name FROM tblPersons
INNER JOIN tblHobbies as hobby1 ON hobby1.hobby = 'driving'
INNER JOIN tblHobbies as hobby2 ON hobby2.hobby = 'hiking'
WHERE tblPersons.id = hobby1.person_id and tblPersons.id = hobby2.person_id;
But it is rather slow. Isn't there any better solution?
First, you don't have a Primary Key on tblHobbies this is one cause of slow query (and other problems). Also you should consider creating a index on tblHobbies.hobby.
Second, I'd to advice you to create a third table to evidence N:N cardinality that exists in your model and avoid redundant hobbies. Something like:
--Person
CREATE TABLE tblPersons (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);
--Hobby
CREATE TABLE tblHobbies (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hobby TEXT
);
--Associative table between Person and Hobby
CREATE TABLE tblPersonsHobbies (
person_id INTEGER REFERENCES tblPersons (id),
hobby_id INTEGER REFERENCES tblHobbies (id),
PRIMARY KEY (person_id, hobby_id)
);
Adds an extra table but it's worth it.
--Query on your current model
SELECT id, name FROM tblPersons
INNER JOIN tblHobbies as hobby1 ON tblPersons.id = hobby1.person_id
WHERE hobby1.hobby IN ('driving', 'hiking');
--Query on suggested model
SELECT id, name FROM tblPersons
INNER JOIN tblPersonsHobbies as personsHobby ON tblPersons.id = personsHobby.person_id
INNER JOIN tblHobbies as hobby1 ON hobby1.id = personsHobby.hobby_id
WHERE hobby1.hobby IN ('driving', 'hiking');
You can aggregate the hobbies table to get persons with both hobbies:
select person_id
from tblhobbies
group by person_id
having count(case when hobby = 'driving' then 1 end) > 0
and count(case when hobby = 'hiking' then 1 end) > 0
Or better with a WHERE clause restricting the records to read:
select person_id
from tblhobbies
where hobby in ('driving', 'hiking')
group by person_id
having count(distinct hobby) =2
(There should be a unique constraint on person + hobby in the table, though. Then you could remove the DISTINCT. And as I said in the comments section it should even be person_id + hobby_id with a separate hobbies table. EDIT: Oops, I should have read the other answer. Michal suggested this data model three hours ago already :-)
If you want the names, select from the persons table where you find the IDs in above query:
select id, name
from tblpersons
where id in
(
select person_id
from tblhobbies
where hobby in ('driving', 'hiking')
group by person_id
having count(distinct hobby) =2
);
With the better data model you'd replace
from tblhobbies
where hobby in ('driving', 'hiking')
group by person_id
having count(distinct hobby) =2
with
from tblpersonhobbies
where hobby_id in (select id from tblhobbies where hobby in ('driving', 'hiking'))
group by person_id
having count(*) =2

Add or delete repeated row

I have an output like this:
id name date school school1
1 john 11/11/2001 nyu ucla
1 john 11/11/2001 ucla nyu
2 paul 11/11/2011 uft mit
2 paul 11/11/2011 mit uft
I would like to achieve this:
id name date school school1
1 john 11/11/2001 nyu ucla
2 paul 11/11/2011 mit uft
I am using direct join as in:
select distinct
a.id, a.name,
b.date,
c.school
a1.id, a1.name,
b1.date,
c1.school
from table a, table b, table c,table a1, table b1, table c1
where
a.id=b.id
and...
Any ideas?
We will need more information such as what your tables contain and what you are after.
One thing I noticed is you have a school and then school1. 3nf states that you should never duplicate fields and append numbers to them to get more information even if you think that the relationship will only be 1 or 2 additional items. You need to create a second table that stores a user associated with 1 to many schools.
I agree with everyone else that both your source table and your desired output are poor design. While you probably can't do anything about your source table, I recommend the following code and output:
Select id, name, date, school from MyTable;
union
Select id, name, date, school1 from MyTable;
(repeat as necessary)
This will give you results in the format:
id name date school
1 john 11/11/2001 nyu
1 john 11/11/2001 ucla
2 paul 11/11/2011 mit
2 paul 11/11/2011 uft
(Note: in my version of SQL, union queries automatically select distinct records so the distinct flag isn't needed)
With this format, you could easily count the number of schools per student, number of students per school, etc.
If processing time and/or storage space is a factor here, you could then split this into 2 tables, 1 with the id,name & date, the other with the id & school (basically what JonH just said). But if you're just working up some simple statistics, this should suffice.
This problem was just too irresistable, so I just took a guess at the data structures that we are dealing with. The technology wasn't specified in the question. This is in Transact-SQL.
create table student
(
id int not null primary key identity,
name nvarchar(100) not null default '',
graduation_date date not null default getdate(),
)
go
create table school
(
id int not null primary key identity,
name nvarchar(100) not null default ''
)
go
create table student_school_asc
(
student_id int not null foreign key references student (id),
school_id int not null foreign key references school (id),
primary key (student_id, school_id)
)
go
insert into student (name, graduation_date) values ('john', '2001-11-11')
insert into student (name, graduation_date) values ('paul', '2011-11-11')
insert into school (name) values ('nyu')
insert into school (name) values ('ucla')
insert into school (name) values ('uft')
insert into school (name) values ('mit')
insert into student_school_asc (student_id, school_id) values (1,1)
insert into student_school_asc (student_id, school_id) values (1,2)
insert into student_school_asc (student_id, school_id) values (2,3)
insert into student_school_asc (student_id, school_id) values (2,4)
select
s.id,
s.name,
s.graduation_date as [date],
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s1 where s1.rank_num = 1) as school,
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s2 where s2.rank_num = 2) as school1
from
student s
Result:
id name date school school1
--- ----- ---------- ------- --------
1 john 2001-11-11 nyu ucla
2 paul 2011-11-11 mit uft