Related
I'm trying to perform a query that returns an aggregation of values from the same table with information from others through a foreign key, but I can't. In the example below, I wanted to return the total sales by state on 2020-01-01 and 2021-01-01, showing the name of the state.
Tables script:
CREATE TABLE IF NOT EXISTS estado (
id SERIAL PRIMARY KEY,
estado VARCHAR(100)
)
CREATE TABLE IF NOT EXISTS municipio (
id SERIAL PRIMARY KEY,
estado integer REFERENCES estado(id),
municipio VARCHAR(100)
)
CREATE TABLE IF NOT EXISTS vendas (
id SERIAL PRIMARY KEY,
municipio integer REFERENCES municipio(id),
valor numeric,
data_venda date
)
INSERT INTO estado VALUES (1, 'PR');
INSERT INTO estado VALUES (2, 'SC');
INSERT INTO estado VALUES (3, 'RS');
INSERT INTO municipio VALUES (1, 1, 'Pelotas');
INSERT INTO municipio VALUES (2, 1, 'Caxias do Sul');
INSERT INTO municipio VALUES (3, 1, 'Porto Alegre');
INSERT INTO municipio VALUES (4, 2, 'Florianopolis');
INSERT INTO municipio VALUES (5, 2, 'Chapeco');
INSERT INTO municipio VALUES (6, 2, 'Itajai');
INSERT INTO municipio VALUES (7, 3, 'Curitiba');
INSERT INTO municipio VALUES (8, 3, 'Maringa');
INSERT INTO municipio VALUES (9, 3, 'Foz do Iguaçu');
INSERT INTO vendas VALUES (1, 6, 5, '2020-01-01');
INSERT INTO vendas VALUES (2, 5, 10, '2021-01-01');
INSERT INTO vendas VALUES (3, 5, 5, '2020-01-01');
INSERT INTO vendas VALUES (4, 4, 2, '2020-01-01');
INSERT INTO vendas VALUES (5, 3, 10, '2021-01-01');
INSERT INTO vendas VALUES (6, 3, 12, '2020-01-01');
INSERT INTO vendas VALUES (7, 3, 20, '2020-01-01');
INSERT INTO vendas VALUES (8, 2, 10, '2020-01-01');
INSERT INTO vendas VALUES (9, 1, 11, '2021-01-01');
INSERT INTO vendas VALUES (10, 9, 4, '2020-01-01');
My attempt (absurd values and the RS ones do not appear):
SELECT
e.estado, SUM(v.valor) as sum2021, SUM(v2.valor) as sum2020
FROM vendas v
CROSS JOIN vendas v2
INNER JOIN municipio m ON v.municipio = m.id
INNER JOIN estado e ON m.estado = e.id
WHERE v.data_venda = '2021-01-01'
AND v2.data_venda = '2020-01-01'
GROUP BY 1;
Translating some terms:
município = city
estado = state
vendas = sales
valor = value
data_venda = date of sale
You're cross joining vendas with itself (as v1 and v2), meaning that each row from it will be matched with each other row (i.e., a Cartesian product), which creates the unexpected results you're seeing.
The good news is that you don't need this join. You can use an aggregate function (sum in this case) on a subset of the rows from the query using the filter clause:
SELECT
e.estado,
SUM(v.valor) FILTER (WHERE data_venda = '2021-01-01') AS sum2021,
SUM(v.valor) FILTER (WHERE data_venda = '2020-01-01') AS sum2020
FROM vendas v
INNER JOIN municipio m ON v.municipio = m.id
INNER JOIN estado e ON m.estado = e.id
GROUP BY
e.estado;
SQLFiddle demo
How can I improve this query for use in large tables....?
I use a table ('DataValues') to store a collection of values ('Value') for collections ('Visit_id') ie it records certain values for each visit.
I use a table ('MatchItems') to store dynamic match sets 'MatchSet' of values ('Value'), sets can contain any number of values. The table also has a IsNeg field to indicate if the match should require a value to be not present in the visit collection.
This allows me to dynamically match visits that conform to certain criteria such as
Must contain values A, B and C and NOT D OR C and B AND NOT A.
ie (Value = A and Value = B and Value = C and Value /= D)
or (Value = C and Value = B and Value /= A)
I have a query that delivers a reasonable solution fiddle:
CREATE TABLE DataValues (
id NUMBER(5) CONSTRAINT DataValues_pk PRIMARY KEY,
Visit_id Number(5) ,
Value varchar(5)
);
INSERT INTO DataValues VALUES (1, 1, 'M');
INSERT INTO DataValues VALUES (2, 1, 'I');
INSERT INTO DataValues VALUES (3, 1, 'C');
INSERT INTO DataValues VALUES (4, 1, 'K');
INSERT INTO DataValues VALUES (5, 1, 'E');
INSERT INTO DataValues VALUES (6, 1, 'Y');
INSERT INTO DataValues VALUES (7, 2, 'M');
INSERT INTO DataValues VALUES (8, 2, 'O');
INSERT INTO DataValues VALUES (9, 2, 'U');
INSERT INTO DataValues VALUES (10, 2, 'S');
INSERT INTO DataValues VALUES (11, 2, 'E');
INSERT INTO DataValues VALUES (12, 3, 'C');
INSERT INTO DataValues VALUES (13, 3, 'A');
INSERT INTO DataValues VALUES (14, 3, 'T');
INSERT INTO DataValues VALUES (15, 4, 'S');
INSERT INTO DataValues VALUES (16, 4, 'A');
INSERT INTO DataValues VALUES (17, 4, 'T');
INSERT INTO DataValues VALUES (18, 5, 'M');
INSERT INTO DataValues VALUES (19, 5, 'A');
INSERT INTO DataValues VALUES (20, 5, 'T');
CREATE TABLE MatchItems (
id NUMBER(5) CONSTRAINT MatchItems_pk PRIMARY KEY,
MatchSet Number(5),
Value VARCHAR(5),
IsNeg NUMBER(1) NOT NULL CHECK (IsNeg in (0,1))
);
INSERT INTO MatchItems VALUES (1, 1, 'M', 0);
INSERT INTO MatchItems VALUES (2, 1, 'I', 0);
INSERT INTO MatchItems VALUES (3, 1, 'C', 0);
INSERT INTO MatchItems VALUES (4, 1, 'K', 0);
INSERT INTO MatchItems VALUES (5, 1, 'E', 0);
INSERT INTO MatchItems VALUES (6, 1, 'Y', 0);
INSERT INTO MatchItems VALUES (7, 2, 'C', 0);
INSERT INTO MatchItems VALUES (8, 2, 'A', 0);
INSERT INTO MatchItems VALUES (9, 3, 'A', 0);
INSERT INTO MatchItems VALUES (10, 3, 'T', 0);
INSERT INTO MatchItems VALUES (11, 4, 'S', 1);
INSERT INTO MatchItems VALUES (12, 4, 'A', 0);
INSERT INTO MatchItems VALUES (13, 4, 'K', 1);
INSERT INTO MatchItems VALUES (14, 5, 'A', 0);
INSERT INTO MatchItems VALUES (15, 5, 'T', 0);
SELECT
MatchItems.MatchSet,
DataValues.Visit_id,
GpMatchItems.Count TgtCount,
Count(MatchItems.Id),
sum(MatchItems.IsNeg)
FROM DataValues
LEFT JOIN MatchItems ON MatchItems.Value = DataValues.Value
--AND MatchItems.MatchSet = 4
LEFT JOIN (SELECT
MatchItems.MatchSet,
count(*) Count
FROM MatchItems
WHERE
MatchItems.IsNeg = 0
GROUP BY
MatchItems.MatchSet) GpMatchItems ON GpMatchItems.MatchSet = MatchItems.MatchSet
HAVING
Count(MatchItems.Id) = GpMatchItems.Count
AND sum(MatchItems.IsNeg) = 0
GROUP BY
MatchItems.MatchSet,
DataValues.Visit_id,
GpMatchItems.Count
How can I improve the performance of this query where the DataValues table contains 100m records, and MatchItems may include a collection of 50 sets each of 2 - 20 values?
You can try this version using Analytic functions and see if it performs any better. This query removes the subquery GpMatchItems that you are joining with.
SELECT DISTINCT matchset,
visit_id,
tgtcount,
match_visit_count,
isneg_sum
FROM (SELECT MatchItems.MatchSet,
DataValues.Visit_id,
COUNT (DISTINCT CASE MatchItems.IsNeg WHEN 0 THEN MatchItems.id ELSE NULL END)
OVER (PARTITION BY MatchItems.MatchSet)
AS tgtcount,
COUNT (*) OVER (PARTITION BY MatchItems.MatchSet, DataValues.Visit_id)
AS match_visit_count,
SUM (MatchItems.IsNeg) OVER (PARTITION BY MatchItems.MatchSet, DataValues.Visit_id)
AS isneg_sum
FROM DataValues LEFT JOIN MatchItems ON MatchItems.VALUE = DataValues.VALUE)
WHERE tgtcount = match_visit_count AND isneg_sum = 0;
I have adjusted EJ's suggestion to include a LEFT JOIN to collect the tgtCount to identify the total number of good matches required in each MatchSet:
SELECT DISTINCT matchset,
visit_id,
tgtcount,
match_visit_count,
isneg_sum
GpMatchItems.count tgtCount
FROM
COUNT (*) OVER (PARTITION BY MatchItems.MatchSet, DataValues.Visit_id)
AS match_visit_count,
SUM (MatchItems.IsNeg) OVER (PARTITION BY MatchItems.MatchSet, DataValues.Visit_id)
AS isneg_sum
FROM DataValues
LEFT JOIN MatchItems ON MatchItems.VALUE = DataValues.VALUE)
LEFT JOIN ( SELECT
MatchItems.MatchSet,
count(*) Count
FROM MatchItems
WHERE MatchItems.IsNeg = 0
GROUP BY
MatchItems.MatchSet) GpMatchItems
ON GpMatchItems.MatchSet = MatchItems.MatchSet
)
WHERE
tgtcount = match_visit_count
AND isneg_sum = 0;
SQL Fiddle with schema and my intial attempt.
CREATE TABLE person
([firstname] varchar(10), [surname] varchar(10), [dob] date, [personid] int);
INSERT INTO person
([firstname], [surname], [dob] ,[personid])
VALUES
('Alice', 'AA', '1/1/1990', 1),
('Alice', 'AA', '1/1/1990', 2),
('Bob' , 'BB', '1/1/1990', 3),
('Carol', 'CC', '1/1/1990', 4),
('Alice', 'AA', '1/1/1990', 5),
('Kate' , 'KK', '1/1/1990', 6),
('Kate' , 'KK', '1/1/1990', 7)
;
CREATE TABLE person_membership
([personid] int, [personstatus] varchar(1), [memberid] int);
INSERT INTO person_membership
([personid], [personstatus], [memberid])
VALUES
(1, 'A', 10),
(2, 'A', 20),
(3, 'A', 30),
(3, 'A', 40),
(4, 'A', 50),
(4, 'A', 60),
(5, 'T', 70),
(6, 'A', 80),
(7, 'A', 90);
CREATE TABLE membership
([membershipid] int, [memstatus] varchar(1));
INSERT INTO membership
([membershipid], [memstatus])
VALUES
(10, 'A'),
(20, 'A'),
(30, 'A'),
(40, 'A'),
(50, 'T'),
(60, 'A'),
(70, 'A'),
(80, 'A'),
(90, 'T');
There are three tables (as per the fiddle above). Person table contains duplicates, same people entered more than once, for the purpose of this exercise we assume that a combination of the first name, surname and DoB is enough to uniquely identify a person.
I am trying to build a query which will show duplicates of people (first name+surname+Dob) with two or more active entries in the Person table (person_membership.person_status=A) AND two or more active memberships (membership.mestatus=A).
Using the example from SQL Fiddle, the result of the query should be just Alice (two active person IDs, two active membership IDs).
I think I'm making progress with the following effort but it looks rather cumbersome and I need to remove Katie from the final result - she doesn't have a duplicate membership.
SELECT q.firstname, q.surname, q.dob, p1.personid, m.membershipid
FROM
(SELECT
p.firstname,p.surname,p.dob, count(*) as cnt
FROM
person p
GROUP BY
p.firstname,p.surname,p.dob
HAVING COUNT(1) > 1) as q
INNER JOIN person p1 ON q.firstname=p1.firstname AND q.surname=p1.surname AND q.dob=p1.dob
INNER JOIN person_membership pm ON p1.personid=pm.personid
INNER JOIN membership m ON pm.memberid = m.membershipid
WHERE pm.personstatus = 'A' AND m.memstatus = 'A'
Since you are using SQL Server windows function will be handy for this scenario. The following will give you the expected output.
SELECT firstname,surname,dob,personid,memberid
from(
SELECT firstname,surname,dob,p.personid,memberid
,Rank() over(partition by p.firstname,p.surname,p.dob order by p.personid) rnasc
,Rank() over(partition by p.firstname,p.surname,p.dob order by p.personid desc) rndesc
FROM [StagingGRG].[dbo].[person] p
INNER JOIN person_membership pm ON p.personid=pm.personid
INNER JOIN membership m ON pm.memberid = m.membershipid
where personstatus='A' and memstatus='A')a
where a.rnasc+rndesc>2
You have to add Group by and Having clause to return duplicate items only-
SELECT
person.firstname,person.surname,person.dob
FROM
person, person_membership, membership
WHERE
person.personid=person_membership.personid AND person_membership.memberid = membership.membershipid
AND
person_membership.personstatus = 'A' AND membership.memstatus = 'A'
GROUP BY
person.firstname,person.surname,person.dob
HAVING COUNT(1) > 1
I have this table structure and I want a query that needs to be returning opposite side columns values.
CREATE TABLE TransactionDetail
(
ID NUMERIC NOT NULL PRIMARY KEY,
TransactionCode bigint,
COATitle NVARCHAR(50),
DrAmount NUMERIC,
CrAmount NUMERIC
);
INSERT INTO TransactionDetail VALUES (1, 1, 'Overtime', '2500', NULL);
INSERT INTO TransactionDetail VALUES (2, 1, 'Internship', NULL, '1500');
INSERT INTO TransactionDetail VALUES (3, 1, 'Medical', NULL, '1000');
INSERT INTO TransactionDetail VALUES (4, 2, 'Internship', '1150', NULL);
INSERT INTO TransactionDetail VALUES (5, 2, 'Overtime', NULL, '1150');
INSERT INTO TransactionDetail VALUES (6, 3, 'Overtime', '600', NULL);
INSERT INTO TransactionDetail VALUES (7, 3, 'Refreshment', '400', NULL);
INSERT INTO TransactionDetail VALUES (8, 3, 'Car Loan', '200', NULL);
INSERT INTO TransactionDetail VALUES (9, 3, 'Office Expenses', NULL, '1200');
If I pass parameter with value Overtime then it should return following rows
SELECT COATitle, DrAmount, CrAmount
FROM TransactionDetail
WHERE COATitle <> Overtime
Internship NULL 1500
Medical NULL 1000
Internship 1150 NULL
Office Expenses NULL 1200
The logic is against each transaction if the selected Account is on Debit side it should print the Credit side accounts and if the selected Account is on Credit side it should print Debit side accounts against that specific TransactionCode
The following code gives the desired result. It does that by checking to see whether the provided parameter is a debit or credit (or doesn't exist) in the current transaction, then only displays the reverse as specified.
declare #Parameter nvarchar(50) = 'Overtime'
declare #Trans TABLE
(
ID NUMERIC NOT NULL,
TransactionCode bigint,
COATitle NVARCHAR(50),
DrAmount NUMERIC,
CrAmount NUMERIC
);
INSERT INTO #Trans VALUES (1, 1, 'Overtime', '2500', NULL);
INSERT INTO #Trans VALUES (2, 1, 'Internship', NULL, '1500');
INSERT INTO #Trans VALUES (3, 1, 'Medical', NULL, '1000');
INSERT INTO #Trans VALUES (4, 2, 'Internship', '1150', NULL);
INSERT INTO #Trans VALUES (5, 2, 'Overtime', NULL, '1150');
INSERT INTO #Trans VALUES (6, 3, 'Overtime', '600', NULL);
INSERT INTO #Trans VALUES (7, 3, 'Refreshment', '400', NULL);
INSERT INTO #Trans VALUES (8, 3, 'Car Loan', '200', NULL);
INSERT INTO #Trans VALUES (9, 3, 'Office Expenses', NULL, '1200');
select TransactionCode, COATitle, DrAmount, CrAmount
from (
SELECT TransactionCode, COATitle, DrAmount, CrAmount
, case when exists (select 1 from #Trans T1 where T1.TransactionCode = T.TransactionCode and T1.COATitle = #Parameter and DrAmount is not null) then 1
when exists (select 1 from #Trans T1 where T1.TransactionCode = T.TransactionCode and T1.COATitle = #Parameter and CrAmount is not null) then -1
else 0 end TransSign
FROM #Trans T
WHERE COATitle <> #Parameter
) X
where (TransSign = -1 and DrAmount is not null)
or (TransSign = 1 and CrAmount is not null)
I'm trying to create a calculated column in SQL. Basically I need to get a set of distinct dates and determine how many customers there are in the population on that particular date. The result should be something like:
Date______| Customers
2016-01-01 | 1
2016-01-01 | 2
2016-01-05 | 3
2016-02-09 | 4
etc.
I created a sample database & data (using MySQL as I don't have permission to create tables in our Oracle dbs) with the following script:
create database customer_example;
use customer_example;
create table customers (
customer_id int not null primary key,
customer_name varchar(255) not null,
term_date DATE);
create table employee (
employee_id int not null primary key,
employee_name varchar(255) not null);
create table cust_emp (
ce_id int not null AUTO_INCREMENT,
emp_id int not null,
cust_id int not null,
start_date date,
end_date date,
deleted_yn boolean,
primary key (emp_id, cust_id, ce_id),
foreign key (cust_id) references customers(customer_id),
foreign key (emp_id) references employee(employee_id));
insert into customers (customer_id, customer_name)
values (1, 'Bobby Tables'), (2, 'Grover Cleveland'), (3, 'Chester Arthur'), (4, 'Jan Bush'), (5, 'Emanuel Porter'), (6, 'Darren King'), (7, 'Casey Mcguire'), (8, 'Robin Simpson'), (9, 'Robin Tables'), (10, 'Mitchell Arnold');
insert into customers (customer_id, customer_name, term_date)
values (11, 'Terrell Graves', '2017-01-01'), (12, 'Richard Wagner', '2016-10-31'), (13, 'Glenn Saunders', '2016-11-19'), (14, 'Bruce Irvin', '2016-03-11'), (15, 'Glenn Perry','2016-06-06'), (16, 'Hazel Freeman', '2016-07-10'),
(17, 'Martin Freeman', '2016-02-11'), (18, 'Morgan Freeman', '2017-02-01'), (19, 'Dirk Drake', '2017-01-12'), (20, 'Fraud Fraud', '2016-12-31');
insert into employee (employee_id, employee_name)
values (1000, 'Cedrick French'), (1001, 'Jane Phillips'), (1002, 'Brian Green'), (1003, 'Shawn Brooks'), (1004, 'Clarence Thomas');
insert into cust_emp (emp_id, cust_id, start_date, end_date)
values (1000, 1, '2016-01-01', '2016-02-01'), (1000, 1, '2016-02-01', '2016-02-01'), (1000, 2,'2016-01-05', '2016-01-16'),(1000, 3,'2016-02-09', '2016-03-14'),(1000, 4,'2016-03-20', '2016-04-23'),
(1000, 5,'2016-01-01', '2016-01-16'),(1000, 6,'2016-01-01', '2016-01-16'),(1004, 7, '2016-01-14', '206-01-16'),
(1004, 8, '2016-01-13', '2016-01-16'),(1004, 9, '2016-01-05', '2016-01-16'), (1003, 12, '2016-04-21', '2016-11-30');
insert into cust_emp (emp_id, cust_id, start_date, deleted_yn)
values (1002, 11, '2016-04-10', TRUE),(1003, 10, '2016-01-16', FALSE), (1004, 12, '2016-04-20', TRUE), (1004, 12, '2016-04-19', FALSE), (1003, 13, '2016-06-06', TRUE), (1002, 14, '2016-06-10', TRUE),
(1004, 15, '2016-03-25', TRUE), (1004, 17, '2016-01-02', TRUE), (1004, 18, '2017-01-01', TRUE), (1004, 19, '2016-11-13', TRUE), (1004, 20, '2016-03-10', TRUE), (1004, 16, '2016-05-13', TRUE);
insert into cust_emp (emp_id, cust_id, start_date)
values (1002, 1, '2016-02-01'), (1004, 2, '2016-01-16'),(1003, 3, '2016-03-14'),(1002, 4, '2016-04-23'),(1004, 5, '2016-01-16'),(1002, 6, '2016-01-16'),(1004, 7, '2016-01-16'),
(1004, 8, '2016-01-16'),(1002, 9, '2016-01-16'), (1004, 10, '2016-01-16');
The following SQL works fine in MySQL but when I try it in Oracle, I get an 'invalid identifier' on 'dates':
select distinct(ce.start_date) as dates,
(select count(distinct(c.customer_id))
from customers c
inner join cust_emp ce on c.customer_id = ce.cust_id
where ce.start_date < dates
and (ce.end_date > dates or (ce.deleted_yn = false or ce.deleted_yn is null))
and (c.term_date > dates or c.term_date is null)
)
from cust_emp as ce;
It seems as though this is because the dates is too far in a subquery. I've tried a CTE as well, but that seems to have the same issue as it gave the same error. How can I re-write this so that I can assess how many customers were there for each date in Oracle?
Huh?
Isn't this what you want?
select ce.dates as dates, count(distinct c.customer_id)
from cust_emp ce join
customers c
on c.customer_id = ce.cust_id
where ce.start_date < ce.dates and
(ce.end_date > ce.dates or ce.deleted_yn = false or ce.deleted_yn is null) and
(c.term_date > ce.dates or c.term_date is null)
group by ce.dates
order by ce.dates;
I don't really understand the use of the subquery with select distinct. The logic you describe is more easily understood as a simple aggregation.
I'm not sure where dates comes from. It is not in your data model, but it is in your sample query.