Access SQL Group by with condition - sql

I'm using MS Access for the following task (due to office restrictions). I'm quite new to SQL.
I have the following table:
I want to select all stores grouped by street, zip and place. But i only want to group them, if the SquareSum (after Group by) is < 1000. Rue de gare 2 should be grouped, while Bahnhofstrasse 23 should be seperate lines.
So far as i know MS Access doesn't allow a case statement. So my query looks like this:
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare,
FROM Table1
SWITCH (SumSquare > 1000, GROUP BY (Street, ZIP, Place))
I also tried:
GROUP BY
SWITCH (SumSquare > 1000, (Street, ZIP, Place))
But it keeps telling me i have a syntax error. Could someone please help me?

In Access, I would do this with several queries.
This would be easier to do if you had an id on the rows (such as an autonumber).
First query identifies the streets that should be summed.
query: SumTheseStreets
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare
FROM Table1
GROUP BY Street, ZIP, Place
HAVING sum(Square) < 1000
Note the HAVING which is a bit like a WHERE clause that's applied outside of the GROUP BY or SUM
Second query identifies the other rows (notes on this one below):
query: StreetsNotSummed
SELECT
Street,
ZIP,
Place,
Square AS SumSquare
FROM Table1
LEFT JOIN SumTheseStreets ON Table1.Street = SumTheseStreets.Street AND Table1.ZIP = SUmTheseStreets.ZIP AND Table1.Place = SumTheseStreets.Place
WHERE SumTheseStreets.Street IS NULL;
A couple of notes:
I've called the field SumSquare because I want it to be the same name as the SumSquare field in the first query
It uses the first query as one of the input "tables"
This uses a LEFT JOIN which means "give me all of the rows in the first table (table1) and if any rows in the second table (SumTheseStreets) match, put those in as well.
but then it filters out the rows that DO match.
So this query only lists the streets that you want NOT summed.
So now you need a third query.
This simply includes all of the rows in both of those queries.
I'm not too sure on the Access syntax on this one, but there's a union query wizard if this isn't right.
Query: TheAnswerRequired
SELECT
Street,
ZIP,
Place,
SumSquare
FROM SumTheseStreets
UNION
SELECT
Street,
ZIP,
Place,
SumSquare
FROM StreetsNotSummed
(it might need to be UNION ALL)
Good luck.

You can use UNION ALL:
SELECT ts.*
FROM (SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
WHERE ts.SumSquare < 1000
UNION ALL
SELECT t1.*
FROM Table1 as t1 INNER JOIN
(SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
ON t1.Street = ts.Street AND t1.Zip = ts.Zip and t1.Place = ts.Place
WHERE ts.SumSquare >= 1000

Related

Find string from table in cell in BiqQuery --> Query exceeded resource limits

I have two tables in BigQuery:
City List: Table: invertible-fin-XXX238.Reports.City
StationionNames: invertible-fin-XXX238.Reports.Station
Most of the StationNames containing City Names. Now I want to extract the city from the Station Table.
Here some example data:
City: Berlin
Stationname: inStore_Berlin_Alexanderplatz
Stationname: Berlin Schönefeld Airport
Stationname: Train Station Franchise Berlin
I tried the INSTR Function, but had no success (the INSTR works only with Legacy SQL and there I couldn’t use SUBSELECTS).
SELECT City,
INSTR((SELECT AdGroupName
FROM [invertible-fin-XXX238.Reports.City]),City) AS Match
FROM [invertible-fin-XXX238.Reports.Station]
Therefore I tried it with WHERE LIKE. Below the SQL Code:
SELECT a.City
FROM [invertible-fin-XXX238.Reports.City] a
CROSS JOIN [invertible-fin-XXX238.Reports.Station] b
WHERE b. Name LIKE '%' + a.City + '%'
GROUP BY a.City
But now the Query is too computationally intensive and I got the Error Code “Query exceeded resource limits for tier 1. Tier 18 or higher required.” back.
Could some please help me, writing a more resource friendly query.
Thanks in advance,
Philipp
Below are few of many possible versions for BiigQuery Standard SQL
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON REPLACE(LOWER(station), LOWER(city), '') <> LOWER(station)
or
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(station) LIKE CONCAT('%',LOWER(city),'%')
You can remove LOWER() function if names of City are spelled in same case in both tables
While above versions look more straightforward - i would prefer below one as it allows control way you extract city from station -r'([^ _]+)' - you should all characters that you observe being delimiters in column station. So in this case you will extract only city when it is not part of longer name
Of course you should validate if you even need to worry of this
#standardSQL
WITH tokens AS (
SELECT token, station
FROM `invertible-fin-XXX238.Reports.Station` AS s,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(station), r'([^ _]+)')) token
)
SELECT city, station
FROM tokens AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(city) = token
I also wonder how the performance for a sub-query would be in this case. For instance:
WITH City AS(
SELECT 'Berlin' As Name UNION ALL
SELECT 'Hamburg'
),
StationNames AS(
SELECT 'inStore_Berlin_Alexanderplatz' AS Name UNION ALL
SELECT 'Berlin Schönefeld Airport' UNION ALL
SELECT 'Train Station Franchise Berlin' UNION ALL
SELECT 'Train Station Hamburg' UNION ALL
SELECT 'Train Station Pluton'
)
SELECT
Name StationName,
(SELECT Name FROM City c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM StationNames s
Or in your case:
SELECT
Name StationName,
(SELECT Name FROM `invertible-fin-XXX238.Reports.City` c WHERE LOWER(s.Name) LIKE CONCAT('%', LOWER(c.Name), '%')) city
FROM `invertible-fin-XXX238.Reports.Station` s
I know it's common sense for most databases that JOIN has better performance than sub-queries but BigQuery have lots of different optimization techniques for storing and querying data, I was curious to know how different the performance would be in this case.

SQL Assignment about joining tables

I am working on a SQL assignment in Oracle. There are two tables.
table1 is called Person10:
fields include: ID, Fname, Lname, State, DOH, JobTitle, Salary, Cat.
table2 is called StateInfo:
fields include: State, Statename, Capital, Nickname, Pop2010, pop2000, pop1990, sqmiles.
Question:
Create a view named A10T2 that will display the StateName, Capital and Nickname of the states that have at least 25 people in the Person10 table with a Cat value of N and an annual salary between $75,000 and $125,000. The three column headings should be StateName, Capital and Nickname. The rows should be sorted by the name of the state.
What I have :
CREATE VIEW A10T2 AS
SELECT StateName, Capital, Nickname
FROM STATEINFO INNER JOIN PERSON10 ON
STATEINFO.STATE = PERSON10.STATE
WHERE Person10.CAT = 'N' AND
Person10.Salary in BETWEEN (75000 AND 125000) AND
count(Person10.CAT) >= 25
ORDER BY STATE;
It gives me an error saying missing expression. I may need a group expression... but i dont know what I am doing wrong.
Yeah I originally messed this up when I first answered this because it was on the fly and I didn't have a chance to test what I was putting down. I forgot using a GROUP BY is more suited for aggregate functions (Like SUM, AVG and COUNT in the select) and that's probably why it's throwing the error. Using a ORDER BY is probably the correct option in this case. And you want to order your results by the state so you would use StateName.
SELECT S.StateName, S.Capital, S.Nickname
FROM STATEINFO S
INNER JOIN PERSON10 P ON S.STATE = P.STATE
WHERE P.CAT = 'N'
AND P.Salary BETWEEN 75000 AND 125000
ORDER BY S.StateName
HAVING count(P.CAT) >= 25;
Try moving your count() to HAVING instead of WHERE. You'll also need a GROUP BY clause containing StateName, Capital, and Nickname.
I know this link is Microsoft, not Oracle, but it should be helpful.
https://msdn.microsoft.com/en-us/library/ms180199.aspx?f=255&MSPPError=-2147217396
I'm no Oracle expert, but I'm pretty sure
Person10.Salary in BETWEEN (75000 AND 125000)
should be
Person10.Salary BETWEEN 75000 AND 125000
(no IN and no parentheses). That's how all other SQL dialects I know of work.
Also, move the COUNT() from the WHERE clause to a HAVING clause:
CREATE VIEW A10T2 AS
SELECT StateName, Capital, Nickname
FROM STATEINFO INNER JOIN PERSON10 ON
STATEINFO.STATE = PERSON10.STATE
WHERE Person10.CAT = 'N' AND
Person10.Salary BETWEEN 75000 AND 125000
ORDER BY STATE
HAVING count(Person10.CAT) >= 25;
You can try using a Sub Query like this.
CREATE VIEW A10T2 AS
SELECT statename, capital, nickname
FROM stateinfo
WHERE statename IN (SELECT statename
FROM person10
WHERE Cat = 'N'
AND Salary BETWEEN 75000 AND 125000
GROUP BY statename
HAVING COUNT(*) >= 25)
ORDER BY statename

Oracle Query - Use of Analytical functions

Assume we have loaded a flat file with patient diagnosis data into a table called “Data”. The table structure is:
Create table Data (
Firstname varchar(50),
Lastname varchar(50),
Date_of_birth datetime,
Medical_record_number varchar(20),
Diagnosis_date datetime,
Diagnosis_code varchar(20))
The data in the flat file looks like this:
'jane','jones','2/2/2001','MRN-11111','3/3/2009','diabetes'
'jane','jones','2/2/2001','MRN-11111','1/3/2009','asthma'
'jane','jones','5/5/1975','MRN-88888','2/17/2009','flu'
'tom','smith','4/12/2002','MRN-22222','3/3/2009','diabetes'
'tom','smith','4/12/2002','MRN-33333','1/3/2009','asthma'
'tom','smith','4/12/2002','MRN-33333','2/7/2009','asthma'
'jack','thomas','8/10/1991','MRN-44444','3/7/2009','asthma'
You can assume that no two patients have the same firstname, lastname, and date of birth combination. However one patient might have several visits on different days. These should all have the same medical record number.
The problem is this: Tom Smith has 2 different medical record numbers. Write a query that would always show all the patients
who are like Tom Smith – patients with more than one medical record number.
I came up with below query. It works perfectly fine, but wanted to know if there is a better way to write this query using Oracle Analytical function's. Thank you in advance
SELECT a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number
FROM data a, data b
WHERE a.firstname = b.firstname
AND a.lastname = b.lastname
AND a.date_of_birth = b.date_of_birth
AND a.medical_record_number <> .medical_record_number
GROUP BY a.firstname,
a.lastname,
a.date_of_birth,
a.medical_record_number
It is possible to do via analytic functions, but whether it's faster than doing the join in your query* or not depends on what data you have. You'd need to test.
with data (firstname, lastname, date_of_birth, medical_record_number, diagnosis_date, diagnosis_code)
as (select 'jane','jones','2/2/2001','MRN-11111',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'jane','jones','2/2/2001','MRN-11111',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jane','jones','5/5/1975','MRN-88888',to_date('2/17/2009', 'mm/dd/yyyy'),'flu' from dual union all
select 'tom','smith','4/12/2002','MRN-22222',to_date('3/3/2009', 'mm/dd/yyyy'),'diabetes' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('1/3/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'tom','smith','4/12/2002','MRN-33333',to_date('2/7/2009', 'mm/dd/yyyy'),'asthma' from dual union all
select 'jack','thomas','8/10/1991','MRN-44444',to_date('3/7/2009', 'mm/dd/yyyy'),'asthma' from dual),
-- end of mimicking your table and its data
res as (select firstname,
lastname,
date_of_birth,
medical_record_number,
count(distinct medical_record_number) over (partition by firstname, lastname, date_of_birth) cnt_med_rec_nums
from data)
select distinct firstname,
lastname,
date_of_birth,
medical_record_number
from res
where cnt_med_rec_nums > 1;
*btw, the group by in your example query is not necessary; it would make much more sense to switch it out for a distinct - it makes your intent much clearer, since you're wanting to get a distinct set of records.
You can probably simplify the query a bit using a HAVING clause rather than doing a self-join
SELECT a.firstname,
a.lastname,
a.date_of_birth,
MIN(a.medical_record_number) lowest_medical_record_number,
MAX(a.medical_record_number) highest_medical_record_number
FROM data a
GROUP BY a.firstname,
a.lastname,
a.date_of_birth
HAVING COUNT( DISTINCT a.medical_record_number ) > 1
I'm returning the smallest and largest medical record number for each patient here (that's what I'd do if most of the patients with this problem have just two numbers rather than having dozens). You could return just one or you could return a comma-separated list of all the medical record numbers if you'd rather (which would probably make more sense if most of the bad folks have dozens of numbers).

Displaying two rows that differ by one value but are otherwise identical

I am doing a databases course and I have a question that I don't seem to be able to get the answer right to.
There are 3 tables:
country(code, iso_abbreviation, name)
area(name, city, country_code, latitude, longitude, elevation)
attraction(name, type, city, country_name, latitude, longitude, elevation)
Now, the question asks this: areas are found in both the attraction and area tables. List
(country_abbreviation, area_name, latitude, longitude, elevation)
for all the areas above 5000 feet elevation. As there may be some inconsistency between the area and attraction data, latitude, longitude and elevation might differ. In such cases, display both variants of the data.
So I came up with the query below, but I'm not sure it pairs them up correctly and it also doesn't split the data into two rows where one of the (latitude, longitude, elevation) elements is different.
SELECT country.iso_abbreviation as country_abbreviation, area.name as name,
area.latitude, area.longitude, area.elevation
FROM area JOIN country on country.code = area.country_code
JOIN attraction on area.name = attraction.name
WHERE area.elevation > 10000
UNION
SELECT DISTINCT country.iso_abbreviation as country_abbreviation, area.name,
attraction.latitude, attraction.longitude, attraction.elevation
FROM area JOIN country on country.code = area.state_code
JOIN attraction on area.name = attraction.name
WHERE attraction.elevation > 10000 ORDER BY country_abbreviation
;
Could someone please help me out with this?
This would do what you describe:
WITH cte AS (
SELECT c.iso_abbreviation AS country_abbreviation
, a.name, a.latitude, a.longitude, a.elevation
FROM area a
JOIN country c ON c.code = a.country_code
WHERE a.elevation > 5000
)
SELECT * FROM cte
UNION
SELECT c.country_abbreviation
, t.name, t.latitude, t.longitude, t.elevation
FROM cte c
JOIN attraction t USING (name) -- assuming name links area & attraction (?)
ORDER BY country_abbreviation, name -- (?)
But honestly, the table layout as well as the task you have been given seem unclear.
Using a common table expression to reuse results from first query.
UNION (as opposed to UNION ALL) removes full duplicates automatically

DB2 SQL Join and Max value

The database I'm accessing has two tables I need to query using DB2 SQL, shown here as nametable and addresstable. The query is for finding all of the people with a certain balance due. The addresses are stored in a separate table to keep track of address changes. In addresstable, the latest address is determined by a sequence number (ADDRSEQUENCE). The AddressID field is present in both tables, and is what ties each person to specific addresses. The highest sequence number is the current address. I need that current address for each person and only that one. I know I'm going to have to use MAX somewhere for the sequence number, but I can't figure out how to position it given the join. Here's my current query, which of course returns all addresses...
SELECT NAMETABLE.ACCTNUM AS ACCOUNTNUMBER,
NAMETABLE.NMELASTBUS AS LASTNAME,
NAMETABLE.NAME_FIRST AS FIRSTNAME,
NAMETABLE.BALDUE AS BALANCEDUE,
ADDRESSTABLE.STREETNAME AS ADDR,
ADDRESSTABLE.ADDRLINE2 AS
ADDRLINE2,ADDRESSTABLE.CITYPARISH AS CITY,
ADDRESSTABLE.ADDRSTATE AS STATE,
ADDRESSTABLE.ZIPCODE AS ZIP,
ADDRESSTABLE.ADDIDSEQNO AS ADDRSEQUENCE
FROM NAMETABLE JOIN ADDRESSTABLE ON NAMETABLE.ADDRESSID = ADDRESSTABLE.ADDRESSID
WHERE NAMETABLE.BALANCEDUE >= '50.00'
You can do a sub-select on the MAX(ADDRSEQUENCE) like so:
SELECT
N.ACCTNUM AS ACCOUNTNUMBER
,N.NMELASTBUS AS LASTNAME
,N.NAME_FIRST AS FIRSTNAME
,N.BALDUE AS BALANCEDUE
,A.STREETNAME AS ADDR,
,A.ADDRLINE2 AS
,A.ADDRLINE2
,A.CITYPARISH AS CITY,
,A.ADDRSTATE AS STATE,
,A.ZIPCODE AS ZIP,
FROM NAMETABLE AS N
JOIN ADDRESSTABLE AS A
ON N.ADDRESSID = A.ADDRESSID
WHERE N.BALANCEDUE >= '50.00'
AND A.ADDRSEQUENCE = (
SELECT MAX(ADDRSEQUENCE)
FROM ADDRESSTABLE AS A2
WHERE A.ADDRESSID = A2.ADDRESSID
)
This is pretty quick in DB2.
You can use a row_number and partition by to do this. Something like this:
with orderedaddress as (
select row_number() over (partition by ADDRESSID order by ADDRSEQUENCE desc) as rown,
STREETNAME,ADDRESSID, ... from ADDRESSTABLE
)
select NAMETABLE.ACCTNUM AS ACCOUNTNUMBER,
...
oa.STREETNAME
...
from NAMETABLE JOIN orderedaddress oa on NAMETABLE.ADDRESSID = oa.ADDRESSID
where oa.rown = 1
and NAMETABLE.BALANCEDUE >= '50.00'