I want to count how frequent values appear in certain columns and create a new table, with the values as columns and the frequencies as data. Example:
create table users
(id number primary key,
name varchar2(255));
insert into users values (1, 'John');
insert into users values (2, 'Joe');
insert into users values (3, 'Max');
create table meals
(id number primary key,
user_id number,
food varchar2(255));
insert into meals values (1, 1, 'Apple');
insert into meals values (2, 1, 'Apple');
insert into meals values (3, 1, 'Orange');
insert into meals values (4, 1, 'Bread');
insert into meals values (5, 1, 'Apple');
insert into meals values (6, 2, 'Apple');
insert into meals values (7, 2, 'Bread');
insert into meals values (8, 2, 'Bread');
insert into meals values (9, 2, 'Apple');
insert into meals values (10, 3, 'Orange');
insert into meals values (11, 3, 'Bread');
insert into meals values (12, 3, 'Bread');
So I got different users and their meals (here Bread, Apple and Oranges). For every user I want to know how often did he eat the different food. The following query does exactly what I want:
select
(select count(id) from meals where meals.user_id = users.id and meals.food = 'Apple') as count_apple,
(select count(id) from meals where meals.user_id = users.id and meals.food = 'Orange') as count_orange,
(select count(id) from meals where meals.user_id = users.id and meals.food = 'Bread') as count_bread
from users;
The problem is, this is REALLY slow, especially when I got more than 100.000 users and dozens of different foods. I am sure that there is a faster way, but I am not experienced enough in SQL to solve this problem.
If you're using 11g, then you can use the pivot operator, like so:
select * from (
select user_id, food from meals
)
pivot (count(*) as count for (food) in ('Apple', 'Orange', 'Bread'));
Otherwise you'll have to do a manual pivot:
select user_id,
sum(case when food = 'Apple' then 1 else 0 end) count_apple,
sum(case when food = 'Orange' then 1 else 0 end) count_orange,
sum(case when food = 'Bread' then 1 else 0 end) count_bread
from meals
group by user_id
In either case, these should be faster than your original approach as you're only accessing the meals table once.
Related
Hitting a small wall with a query here. trying to see if transactions contain type 01 while excluding transactions that contain item 23 or 25.
here's a reprex.
In SQL fiddle
create table purchases (
transaction_id int,
item int,
type int,
customer char(1)
);
insert into purchases values (1, 23, 01, "A");
insert into purchases values (1, 25, 01, "A");
insert into purchases values (2, 23, 01, "B");
insert into purchases values (2, 25, 01, "B");
insert into purchases values (2, 1, 01, "B");
insert into purchases values (3, 3, 01, "A");
insert into purchases values (4, 23, 01,"B");
insert into purchases values (4, 25, 01,"B");
insert into purchases values (5, 23, 01,"A");
insert into purchases values (6, 4, 02,"C");
insert into purchases values (7, 9, 03,"C");
Here's the query to identify transactions that only have items 23 and 25 but nothing else, it works, (should be transactions, 1,4 & 5).
select transaction_id from purchases where item in (23,25)
and transaction_id not in (select transaction_id from purchases where item not in (23,25));
However, when I'm struggling to single out the transactions that have type 01 but not items 23 and 25.
I tried this, but it gives out transactions 2 & 3 when it should only be 3 since 2 does contain items 23 & 25.
here's the query I was going with, based on the first one.
select * from purchases where type = 1 and transaction_id not in (select transaction_id from purchases where item in (23,25)
and transaction_id not in (select transaction_id from purchases where item not in (23,25)));
expected result
transaction_id item type customer
3 3 01 A
Based on your updated question, i'd suggest you use the NOT EXISTS clause like below
select * from purchases p1 where not exists
(
select 1 from purchases p2 where p1.transaction_id=p2.transaction_id
and p2.item in (23,25))
and type=1
fiddle demo link
I see that you have already changed the expected result in the question several times (while the query itself does not change), so I'm not sure what exactly you want to get.
In any case, you can take this dbfiddle example, and using arrays, filtered by distinct sorted elements:
You want one row per transaction, so aggregate and GROUP BY transaction_id. Then use the HAVING clause and COUNT conditionally.
select transaction_id
from purchases
group by transaction_id
having count(*) filter (where item = 23) = 0
and count(*) filter (where item = 25) = 0
and count(*) filter (where type = 1) > 0
order by transaction_id;
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=520755370f13d41ba35ca12e7eb5277e
If you want to show all rows matching above transaction IDs:
select * from purchases where transaction_id in ( <above query> );
Here is one option
select p.*
from purchases p
join (
select transaction_id
from purchases
group by transaction_id
having count(case when item in (25,23) then 1 end)=0
and count(case typ when 1 then 1 end)>0
)x
on p.transaction_id=x.transaction_id
For your sample data:
insert into purchases values (1, 23, 01, 'A');
insert into purchases values (1, 25, 01, 'A');
insert into purchases values (2, 23, 01, 'B');
insert into purchases values (2, 25, 01, 'B');
insert into purchases values (2, 1, 01, 'B');
insert into purchases values (3, 3, 01, 'A');
insert into purchases values (4, 23, 01,'B');
insert into purchases values (4, 25, 01,'B');
insert into purchases values (5, 23, 01,'A');
insert into purchases values (6, 4, 02,'C');
insert into purchases values (7, 9, 03,'C');
Result:
3 3 1 A
I have two tables one is objects with the attribute of id and is_green.The other table is object_closure with the attributes of ancestor_id, descendant_od, and created_at. ie.
Objects: id, is_green
Object_closure: ancestor_id, descendant_od, created_at
There are more attributes in the Object table but not necessary to mention in this question.
I have a query like this:
-- create a table
CREATE TABLE objects (
id INTEGER PRIMARY KEY,
is_green boolean
);
CREATE TABLE object_Closure (
ancestor_id INTEGER ,
descendant_id INTEGER,
created_at date
);
-- insert some values
INSERT INTO objects VALUES (1, 1 );
INSERT INTO objects VALUES (2, 1 );
INSERT INTO objects VALUES (3, 1 );
INSERT INTO objects VALUES (4, 0 );
INSERT INTO objects VALUES (5, 1 );
INSERT INTO objects VALUES (6, 1 );
INSERT INTO object_Closure VALUES (1, 2, 12-12-2020 );
INSERT INTO object_Closure VALUES (1, 3, 12-13-2020 );
INSERT INTO object_Closure VALUES (2, 3, 12-14-2020 );
INSERT INTO object_Closure VALUES (4, 5, 12-15-2020 );
INSERT INTO object_Closure VALUES (4, 6, 12-16-2020 );
INSERT INTO object_Closure VALUES (5, 6, 12-17-2020 );
-- fetch some values
SELECT
O.id,
P.id,
group_concat(DISTINCT P.id ) as p_ids
FROM objects O
LEFT JOIN object_Closure OC on O.id=OC.descendant_id
LEFT JOIN objects P on OC.ancestor_id=P.id AND P.is_green=1
GROUP BY O.id
The result is
query result
I would like to see P.id for O.id=6 is also 5 instead of null. Afterall,5 is still a parentID (p.id). More importantly, I also want the id shown in P.id as the first created id if there are more than one. (see P.created_at).
I understand the reason why it happens is that the first one the system pick is null, and the null was created by the join with the condition of is_green; however, I need to filter out those objects that are green only in the p.id.
I cannot do an inner join (because I need the other attributes of the table and sometimes both P.id and p_ids are null, but still need to show in the result) I cannot restructure the database. It is already there and cannot be changed. I also cannot just use a Min() or Max() aggregation because I want the ID that is picked is the first created one.
So is there a way to skip the null in the join?
or is there a way to filter the selection in the select clause?
or do an order by before the grouping?
P.S. My original code concat the P.id by the order of P.created_at. For some reason, I cannot replicate it in the online SQL simulator.
I have a table containing user experiences, table contains multiple records of same user
JSON example of data
{
user_id : 1,
location: 'india',
company_id: 5,
...other fields
}
{
user_id : 1,
location: 'united kingdom',
company_id: 6
...other fields
}
I want to run a query that gives me results of users who has worked in companies that satisfies IN condition of multiple arrays
E.g
Array1 of company Id: 1,2,4,5,6,7,8,10
Array2 of company Id: 2,6,50,100,12,4
The query should return users who have worked in one of the companies from both arrays, so IN condition of both the arrays should be satisfied
I tried the following query with no luck
select * from <table> where company_id IN(5,7,8) and company_id IN(1,4,3)
and 2 records of a user with company_id 5 and 4 exists in table
create table my_table (user_id int, company_id int);
insert into my_table (user_id, company_id)
values (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 5);
select user_id from my_table where company_id in (5, 7, 8)
intersect
select user_id from my_table where company_id in (1, 4, 3);
As you described, you need to get intersection of users, who are working in two sets of companies.
I want to create a view, displaying book titles and number of reviews made to the specific book.
What is the options when the values are not compatible?
Relevant columns in the Books table:
ISBN13 PK bigint
Title nvarchar(50)
Language nvarchar(30)
Author Id FK int
Category ID FK int
Sample data Books:
INSERT INTO Books VALUES (9783852913735, 'Ulysses', 'English', 100, 'January 06, 2002', 1, null);
INSERT INTO Books VALUES (9780195038637, 'Battle Cry of Freedom', 'English', 490, 'February 25, 1988', 99, null);
INSERT INTO Books VALUES (9789178615155, 'Surhörningen', 'Swedish', 195, '2019', 4, null);
INSERT INTO Books VALUES (9789178614577, 'Jag älskar regnbågsenhörningar', 'Swedish', 190, '2021', 2, null);
Relevant columns in the Reviews table:
ReviewId PK int
BookId FK bigint -- FK to ISBN13
CategoryID FK
WriterId FK
Date
Sample data Reviews:
insert into Reviews values(0020, '9783852913735', '120', 11, '2001-02-21');
insert into Reviews values(0021, '9789177836599', '140', 4, '2001-10-19');
insert into Reviews values(0022, '9789178130979', '110', 1, '2002-02-22');
insert into Reviews values(0023, '9789178130979', '90', 8, '2003-09-06');
insert into Reviews values(0024, '9789178614677', '50', 2, '2005-08-29');
insert into Reviews values(0025, '9789178615155', '10', 5, '2004-08-25');
insert into Reviews values(0026, '971019503872', '10', 9, '2009-06-11');
insert into Reviews values(0027, '9780195038637', '20', 2, '2010-11-10');
Sample data Categories:
insert into Categories (CategoryId, Name) values(10, 'Architecture');
insert into Categories values(20, 'Art');
insert into Categories values(30, 'Astrology');
insert into Categories values(40, 'Baking');
insert into Categories values(50, 'Business Management');
insert into Categories values(60, 'Biology');
insert into Categories values(70, 'Comics');
insert into Categories values(80, 'Computational Science');
SELECT Books.Title, Books.[Author Id]
FROM Books
INNER JOIN Reviews ON Reviews.BookId=Books.ISBN13;
Below is my code for the reviews part, as I want to show the number of reviews per book:
SELECT
BookId,
COUNT
(BookId) [Reviews]
FROM
Reviews
GROUP BY BookId
HAVING COUNT
(BookId)> 1
So expected results would be:
Title | Author | BookId | Category | Number of Reviews
Have a look in to this query. I created the view and since the category has no values compatible with the books table I used a Left join to retrieve the records which has values in both books and reviews. Feel free to comment on the answer and let me know any other additions or alterations if required. I am happy to assist with. Thanks for posting Insert scripts and table definitions which gave me fast implementation and testing capability.
CREATE view My_View AS
(
SELECT
[B].[ISBN13] AS [BookId]
,[B].[Title]
,[B].[AuthorId] AS [Author]
,[C].[Name] As [Category]
, COUNT([R].[ReviewId]) OVER ( PARTITION BY [B].[Title]) AS [Number of reviews]
FROM Reviews [R]
INNER JOIN Books [B]
ON [R].[BookId] = [B].[ISBN13]
LEFT JOIN Categories [C]
ON [B].[CategoryId] = [C].[CategoryId]
)
SELECT * FROM My_View
Assuming from your sample query you are after just a count of reviews, you would have something like this (guessing obviously for the other tables you need to join with). Several ways to correlate but a simple count only requires an inline correlated subquery:
create view MyView as
select
b.Title,
a.Name Author,
b.ISBN13 BookId,
c.Name Category,
(select Count(*) from Reviews r where r.BookId=b.ISBN13) Reviews
from Books b
join Categories c on c.Id=b.CategoryId
join Authors a on a.Id=b.AuthorId
Using a subset of the data you added, this query works fine
Title BookId Reviews
------------------------------ --------------- -----------
Ulysses 9783852913735 1
Battle Cry of Freedom 9780195038637 1
Surhörningen 9789178615155 1
Jag älskar regnbågsenhörningar 9789178614577 0
I have a database of products which I'd like to filter by arbitrary categories. Let's say for the sake of an example that I run a garage. I have a section of products which are cars.
Each car should have a collection of attributes, all cars have the same number and type of attributes; for instance colour:red, doors:2, make:ford ; with those same attributes set to various values on all the cars.
Gut feeling tells me that it would be best to add "colour", "doors" and "make" columns to the product table.
HOWEVER: Not ALL the products in the table are cars. Perhaps I would like to list tyres on the page of cars. Obviously, "colour" and "doors" won't apply to tires. Even so, if a user selects colour=red as a filter, I would still like the tires to be shown as they lack the colour attribute.
Mulling it over (and I'm really not a database guy so I apologise if this approach is horrible) I considered having a single "attributes" column which I could fill with an arbitrary number of arbitrarily named attributes, then use SQLs string functions to do the filtering. I guess you could even use a bit field here if you planned carefully. This seems hackish to me though, I'd be interested to know how some of the larger sites such as Amazon do this.
What are the issues with these approaches, can anyone recommend any alternatives or shed any light on the subject for me?
Thanks in advance
You should read about database normalization. It is generally not a good idea to use concatenated strings as values in a single column. I made a very small sqlfiddle for you to start playing around. This does not really solve all your problems, but it may lead you in the right direction.
Schema:
CREATE TABLE product (id int, name varchar(200), info varchar(200));
INSERT INTO product (id, name, info) VALUES (100, "Porsche", "cool");
...
INSERT INTO product (id, name, info) VALUES (103, "Tires", "you need them!");
CREATE TABLE attr (id int, product_id int, a_name varchar(200), a_value varchar(200));
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 100, "color", "black");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (2, 100, "doors", "2");
...
A Query:
SELECT * FROM product INNER JOIN attr ON attr.product_id=product.id
WHERE attr.a_name="doors" AND attr.a_value = "2"
Anyone reading this in the future, I managed to get the results I wanted thanks to luksch taking the time to help me out!!! Thanks!!!
Using this layout:
CREATE TABLE product (id int, name varchar(200));
INSERT INTO product (id, name) VALUES (100, "Red Porsche");
INSERT INTO product (id, name) VALUES (101, "Red Ferrari V8");
INSERT INTO product (id, name) VALUES (102, "Red Ferrari V12");
INSERT INTO product (id, name) VALUES (103, "Blue Porsche");
INSERT INTO product (id, name) VALUES (104, "Blue Ferrari V8");
INSERT INTO product (id, name) VALUES (105, "Blue Ferrari V12");
INSERT INTO product (id, name) VALUES (106, "Snow Tires");
INSERT INTO product (id, name) VALUES (107, "Fluffy Dice");
CREATE TABLE attr (id int, product_id int, a_name varchar(200), a_value varchar(200));
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 100, "colour", "red");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 101, "colour", "red");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 101, "cylinders", "8");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 102, "colour", "red");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 102, "cylinders", "12");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 103, "colour", "blue");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 104, "colour", "blue");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 104, "cylinders", "8");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 105, "colour", "blue");
INSERT INTO attr (id, product_id, a_name, a_value) VALUES (1, 105, "cylinders", "12");
I achieved the result I wanted; which was two things:
Firstly I wanted to be able to select products by attribute, say by colour and cylinders, but also show any products which have neither the colour nor cylinders attribute, which I achieved with this query:
SELECT DISTINCT product.id, name, a_value
FROM product
LEFT JOIN attr
ON product_id=product.id
WHERE
(
(a_name="colour" AND a_value="blue")
OR
(a_name IS NULL)
)
AND product.id IN
(
SELECT product.id
FROM product
LEFT JOIN attr
ON product_id=product.id
WHERE
(a_name="cylinders" AND a_value="12")
OR
(a_name IS NULL)
)
This lists all the blue cars with 12 cylinders, and also lists the tires and fluffy dice since they have neither a colour or cylinder count. This can easily be adapted to filter on one attribute, or you can add more AND / IN clauses to add more filters
And I also wanted to be able to list all relevant attributes (I use WHERE 1 in this example, but in practise this would be WHERE idfolders=? to list all attribute relevant to a specific folder)
SELECT DISTINCT a_value, a_name
FROM product
INNER JOIN attr
ON product_id=product.id
WHERE 1