How to run multiple SQL queries whose values change? - sql

Let's say I have a database of Amazon customers who made purchases in the last year. It is pretty detailed and has columns like name, age, zip code, income level, favorite color, food, music, etc. Now, let's say I run a query such that I return all Amazon customers who bought Book X.
SELECT NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE BOOK = "X"
This query will return a bunch of customers who bought Book X. Now, I want to iterate through each of those results (iterate through each customer) and create a query based on each customer's individual age, zipcode, and income.
So if the first result is Bob, age 32, lives in zipcode 90210, makes $45,000 annually, create a query to find all others like Bob who share the same age, zipcode, and income. If the second result is Mary, age 41, lives in zipcode 10004, makes $55,000 annually, create a query to find all others like Mary who share the same age, zipcode, and income.
How do I iterate through customers who bought Book X and run multiple queries whose values (age, zipcode, income) are changing? In terms of viewing the results, it'd be great if I could see Bob, followed by all customers who are like Bob, then Mary, and all customers who are like Mary.
Is this even possible in SQL? I know how to do this in C# (for/next loops with if/then statements inside) but am new to SQL, and the data is in SQL.
I use SQL Server 2008.

If i understood your requirement correctly then a nested quesry should do the job. SOmething like this:
SELECT distinct NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers] a, (SELECT NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE BOOK = "X" and name = 'Bob') b
WHERE BOOK = "X" and a.age=b.age and a.zipcode= b.zipcode and a.income=b.income
EDIT: A generic query will be [This will have list of all users]:
SELECT distinct NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers] a, (SELECT distinct NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE BOOK = "X" ) b
WHERE a.BOOK = b.book and a.age=b.age and a.zipcode= b.zipcode and a.income=b.income
order by name

Something like this can do it in one query:
;WITH cteSource as
(
SELECT NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE BOOK = "X"
)
SELECT sr.NAME AS SrcName, cu.NAME AS LikeName
FROM [Amazon].[dbo].[Customers] AS cu
JOIN cteSource As sr
ON cu.AGE = sr.AGE
And cu.ZIPCODE = sr.ZIPCODE
And cu.INCOME = sr.INCOME

Something like this will let you chase related customers to an arbitrary, e.g. 5 here, degree of separation. By constructing the JOINs correctly you can do things like match income within a range, ... .
with Book as (
select Id, Name, Age, ZIPCode, Income -- ...
from Amazon.dbo.Customers
where Book = 'X' ),
RelatedCustomers as (
select C.Id, C.Name, C.Age, C.ZIPCode, C.Income, 1 as Depth -- ...
from Amazon.dbo.Customers as C inner join
Book as B on B.Id <> C.Id and Abs( B.Income - C.Income ) < 2000 -- and ...
union all
select C.Id, C.Name, C.Age, C.ZIPCode, C.Income, RC.Depth + 1-- ...
from Amazon.dbo.Customers as C inner join
RelatedCustomers as RC on RC.Id <> C.Id and Abs( RC.Income - C.Income ) < 2000 -- and ...
where Depth < 5 )
select *
from RelatedCustomers

I think you need two separate queries. First one to bring back the customers, once a customer such as Bob is selected a second query is performed based on Bob's attributes.
A simple example would be a forms application that has two grids. The first displays a list of the users. When you select one of the users the second grid is populated with the results of the second query.
The second query would be something like:
SELECT NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE Age = #BobsAge AND ZipCode = #BobsZipCode AND Income = #BobsIncome
It sounds like you want a simple self-join:
SELECT
MatchingCustomers.NAME,
MatchingCustomers.AGE,
MatchingCustomers.ZIPCODE,
MatchingCustomers.INCOME,
MatchingCustomers.FAVECOLOR,
MatchingCustomers.FAVEFOOD,
MatchingCustomers.FAVEMUSIC
FROM
[Amazon].[dbo].[Customers] SourceCustomer
LEFT JOIN [Amazon].[dbo].[Customers] MatchingCustomers
ON SourceCustomer.Age = MatchingCustomer.Age
AND SourceCustomer.ZipCode = MatchingCustomer.ZipCode
AND SourceCustomer.Income = MatchingCustomer.Income
WHERE
SourceCustomer.Book = 'X'
If you want to see the all source customers and all of their matches in a single result set you can remove the where clause and select data SourceCustomer also:
SELECT
SourceCustomer.Name SourceName,
SourceCustomer.Age SourceAge
SourceCustomer.ZipCode SourceZipCode,
SourceCustomer.Income SourceIncome,
MatchingCustomers.NAME,
MatchingCustomers.AGE,
MatchingCustomers.ZIPCODE,
MatchingCustomers.INCOME,
MatchingCustomers.FAVECOLOR,
MatchingCustomers.FAVEFOOD,
MatchingCustomers.FAVEMUSIC
FROM
[Amazon].[dbo].[Customers] SourceCustomer
LEFT JOIN [Amazon].[dbo].[Customers] MatchingCustomers
ON SourceCustomer.Age = MatchingCustomer.Age
AND SourceCustomer.ZipCode = MatchingCustomer.ZipCode
AND SourceCustomer.Income = MatchingCustomer.Income
WHERE
SourceCustomer.Book = 'X'

Related

Access SQL Group by with condition

I'm using MS Access for the following task (due to office restrictions). I'm quite new to SQL.
I have the following table:
I want to select all stores grouped by street, zip and place. But i only want to group them, if the SquareSum (after Group by) is < 1000. Rue de gare 2 should be grouped, while Bahnhofstrasse 23 should be seperate lines.
So far as i know MS Access doesn't allow a case statement. So my query looks like this:
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare,
FROM Table1
SWITCH (SumSquare > 1000, GROUP BY (Street, ZIP, Place))
I also tried:
GROUP BY
SWITCH (SumSquare > 1000, (Street, ZIP, Place))
But it keeps telling me i have a syntax error. Could someone please help me?
In Access, I would do this with several queries.
This would be easier to do if you had an id on the rows (such as an autonumber).
First query identifies the streets that should be summed.
query: SumTheseStreets
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare
FROM Table1
GROUP BY Street, ZIP, Place
HAVING sum(Square) < 1000
Note the HAVING which is a bit like a WHERE clause that's applied outside of the GROUP BY or SUM
Second query identifies the other rows (notes on this one below):
query: StreetsNotSummed
SELECT
Street,
ZIP,
Place,
Square AS SumSquare
FROM Table1
LEFT JOIN SumTheseStreets ON Table1.Street = SumTheseStreets.Street AND Table1.ZIP = SUmTheseStreets.ZIP AND Table1.Place = SumTheseStreets.Place
WHERE SumTheseStreets.Street IS NULL;
A couple of notes:
I've called the field SumSquare because I want it to be the same name as the SumSquare field in the first query
It uses the first query as one of the input "tables"
This uses a LEFT JOIN which means "give me all of the rows in the first table (table1) and if any rows in the second table (SumTheseStreets) match, put those in as well.
but then it filters out the rows that DO match.
So this query only lists the streets that you want NOT summed.
So now you need a third query.
This simply includes all of the rows in both of those queries.
I'm not too sure on the Access syntax on this one, but there's a union query wizard if this isn't right.
Query: TheAnswerRequired
SELECT
Street,
ZIP,
Place,
SumSquare
FROM SumTheseStreets
UNION
SELECT
Street,
ZIP,
Place,
SumSquare
FROM StreetsNotSummed
(it might need to be UNION ALL)
Good luck.
You can use UNION ALL:
SELECT ts.*
FROM (SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
WHERE ts.SumSquare < 1000
UNION ALL
SELECT t1.*
FROM Table1 as t1 INNER JOIN
(SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
ON t1.Street = ts.Street AND t1.Zip = ts.Zip and t1.Place = ts.Place
WHERE ts.SumSquare >= 1000

finding value in a list created via subquery

Thank you Stack-Community,
This is probably obvious for most of you but I just don't understand why it doesn't work.
I am using the Northwind database and lets say I am trying to find the countries that or not occurring twice but are listed either more than twice or less often.
I already figured out other ways of doing it with a having statement, so I am not looking for alternatives but trying to understand why my initial attempt is not working.
I look at it and look at it and it makes perfect sense to me. Can someone explain what's the problem?
SELECT country, count(country)
FROM Customers
WHERE 2 not in (SELECT count(country) FROM Customers GROUP BY country)
GROUP BY country
;
You need correlated subquery:
SELECT country, count(country)
FROM Customers c
WHERE 2 not in (SELECT count(country) FROM Customers c2
WHERE c2.country = c.country )
GROUP BY country;
Otherwise you get something like:
SELECT country, count(country)
FROM Customers c
WHERE 2 not in (1,2,3) -- false in every case and empty resultset
GROUP BY country;
Imagine that you have:
1, 'UK' -- 1
2, 'DE' -- 2
3, 'DE'
4, 'RU' -- 1
Now you will get equivalent of
SELECT country, count(country)
FROM Customers c
WHERE 2 not in (1,2,1) -- false in every case and empty resultset
GROUP BY country;
-- 0 rows selected

Oracle Query - Use of Analytical functions

Assume we have loaded a flat file with patient diagnosis data into a table called “Data”. The table structure is:
Create table Data (
Firstname varchar(50),
Lastname varchar(50),
Date_of_birth datetime,
Medical_record_number varchar(20),
Diagnosis_date datetime,
Diagnosis_code varchar(20))
The data in the flat file looks like this:
'jane','jones','2/2/2001','MRN-11111','3/3/2009','diabetes'
'jane','jones','2/2/2001','MRN-11111','1/3/2009','asthma'
'jane','jones','5/5/1975','MRN-88888','2/17/2009','flu'
'tom','smith','4/12/2002','MRN-22222','3/3/2009','diabetes'
'tom','smith','4/12/2002','MRN-33333','1/3/2009','asthma'
'tom','smith','4/12/2002','MRN-33333','2/7/2009','asthma'
'jack','thomas','8/10/1991','MRN-44444','3/7/2009','asthma'
You can assume that no two patients have the same firstname, lastname, and date of birth combination. However one patient might have several visits on different days. These should all have the same medical record number.
The problem is this: Tom Smith has 2 different medical record numbers. Write a query that would always show all the patients
who are like Tom Smith – patients with more than one medical record number.
I came up with below query. It works perfectly fine, but wanted to know if there is a better way to write this query using Oracle Analytical function's. Thank you in advance
SELECT a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number
FROM data a, data b
WHERE a.firstname = b.firstname
AND a.lastname = b.lastname
AND a.date_of_birth = b.date_of_birth
AND a.medical_record_number <> .medical_record_number
GROUP BY a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number
It is possible to do via analytic functions, but whether it's faster than doing the join in your query* or not depends on what data you have. You'd need to test.
with data (firstname, lastname, date_of_birth, medical_record_number, diagnosis_date, diagnosis_code)
as (select 'jane','jones','2/2/2001','MRN-11111',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'jane','jones','2/2/2001','MRN-11111',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jane','jones','5/5/1975','MRN-88888',to_date('2/17/2009', 'mm/dd/yyyy'),'flu' from dual union all
select 'tom','smith','4/12/2002','MRN-22222',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('2/7/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jack','thomas','8/10/1991','MRN-44444',to_date('3/7/2009', 'mm/dd/yyyy'),'asthma' from dual),
-- end of mimicking your table and its data
res as (select firstname,
lastname,
date_of_birth,
medical_record_number,
count(distinct medical_record_number) over (partition by firstname, lastname, date_of_birth) cnt_med_rec_nums
from data)
select distinct firstname,
lastname,
date_of_birth,
medical_record_number
from res
where cnt_med_rec_nums > 1;
*btw, the group by in your example query is not necessary; it would make much more sense to switch it out for a distinct - it makes your intent much clearer, since you're wanting to get a distinct set of records.
You can probably simplify the query a bit using a HAVING clause rather than doing a self-join
SELECT a.firstname,
a.lastname,
a.date_of_birth,
MIN(a.medical_record_number) lowest_medical_record_number,
MAX(a.medical_record_number) highest_medical_record_number
FROM data a
GROUP BY a.firstname,
a.lastname,
a.date_of_birth
HAVING COUNT( DISTINCT a.medical_record_number ) > 1
I'm returning the smallest and largest medical record number for each patient here (that's what I'd do if most of the patients with this problem have just two numbers rather than having dozens). You could return just one or you could return a comma-separated list of all the medical record numbers if you'd rather (which would probably make more sense if most of the bad folks have dozens of numbers).

DB2 SQL Join and Max value

The database I'm accessing has two tables I need to query using DB2 SQL, shown here as nametable and addresstable. The query is for finding all of the people with a certain balance due. The addresses are stored in a separate table to keep track of address changes. In addresstable, the latest address is determined by a sequence number (ADDRSEQUENCE). The AddressID field is present in both tables, and is what ties each person to specific addresses. The highest sequence number is the current address. I need that current address for each person and only that one. I know I'm going to have to use MAX somewhere for the sequence number, but I can't figure out how to position it given the join. Here's my current query, which of course returns all addresses...
SELECT NAMETABLE.ACCTNUM AS ACCOUNTNUMBER,
NAMETABLE.NMELASTBUS AS LASTNAME,
NAMETABLE.NAME_FIRST AS FIRSTNAME,
NAMETABLE.BALDUE AS BALANCEDUE,
ADDRESSTABLE.STREETNAME AS ADDR,
ADDRESSTABLE.ADDRLINE2 AS
ADDRLINE2,ADDRESSTABLE.CITYPARISH AS CITY,
ADDRESSTABLE.ADDRSTATE AS STATE,
ADDRESSTABLE.ZIPCODE AS ZIP,
ADDRESSTABLE.ADDIDSEQNO AS ADDRSEQUENCE
FROM NAMETABLE JOIN ADDRESSTABLE ON NAMETABLE.ADDRESSID = ADDRESSTABLE.ADDRESSID
WHERE NAMETABLE.BALANCEDUE >= '50.00'
You can do a sub-select on the MAX(ADDRSEQUENCE) like so:
SELECT
N.ACCTNUM AS ACCOUNTNUMBER
,N.NMELASTBUS AS LASTNAME
,N.NAME_FIRST AS FIRSTNAME
,N.BALDUE AS BALANCEDUE
,A.STREETNAME AS ADDR,
,A.ADDRLINE2 AS
,A.ADDRLINE2
,A.CITYPARISH AS CITY,
,A.ADDRSTATE AS STATE,
,A.ZIPCODE AS ZIP,
FROM NAMETABLE AS N
JOIN ADDRESSTABLE AS A
ON N.ADDRESSID = A.ADDRESSID
WHERE N.BALANCEDUE >= '50.00'
AND A.ADDRSEQUENCE = (
SELECT MAX(ADDRSEQUENCE)
FROM ADDRESSTABLE AS A2
WHERE A.ADDRESSID = A2.ADDRESSID
)
This is pretty quick in DB2.
You can use a row_number and partition by to do this. Something like this:
with orderedaddress as (
select row_number() over (partition by ADDRESSID order by ADDRSEQUENCE desc) as rown,
STREETNAME,ADDRESSID, ... from ADDRESSTABLE
)
select NAMETABLE.ACCTNUM AS ACCOUNTNUMBER,
...
oa.STREETNAME
...
from NAMETABLE JOIN orderedaddress oa on NAMETABLE.ADDRESSID = oa.ADDRESSID
where oa.rown = 1
and NAMETABLE.BALANCEDUE >= '50.00'

SQL Nested Query Homework

Given :
InsuranceCompanies (cid, name, phone, address)
Doctors (did, name, specialty, address, phone, age, cid)
Patients (pid, name, address, phone, age, gender, cid)
Visits (vid, did, pid, date, description)
Where:
cid - Insurance Company code
did - doctor code
pid - patient code
vid - code of visit
And a TASK : Find doctors (did, name) with number of visits (during this year) less than average number of visits to all doctors during this year.
My attempt is:
SELECT D.did, D. name
FROM Doctor D,Visit V
WHERE V.did = D.did and D.did = CV.did and CV.visits <
(SELECT AVG ( CV.visits)
FROM (SELECT V1.did AS did,COUNT(V1.vid) AS visits
FROM Visit V1
WHERE V1.date LIKE '%2012'
GROUP BY V1.did) AS CV)
A BIG THANKS TO Bridge Who shared the most beautifull and user freindly SQL commands visualator ever!
Databse Exemple : http://sqlfiddle.com/#!2/e85c7/3
Solution using views:
CREATE VIEW ThisYear AS
SELECT v.pid,v.vid,v.did
FROM Visits v
WHERE v.date LIKE '%2012';
CREATE VIEW DoctorsVisitCount AS
SELECT v.did, COUNT(v.vid) as c
FROM ThisYear v
GROUP BY v.did;
SELECT DISTINCT d.did,d.dname,dvc.c
FROM Doctors d,DoctorsVisitCount dvc
WHERE dvc.c < (SELECT AVG(dvc.c)
FROM DoctorsVisitCount dvc);