I'm trying to generate a report where it lists the race, ethnicity and county of a client. Right now, the report displays 1,2,3, etc. for the race that has been selected as opposed to Black, White, Other, etc. This is also the case for ethnicity and county. I'm wondering what is the proper way to display the letter description as opposed to the integer for the race, ethnicity, and county. I've included my query below, any assistance is appreciated. Thank you.
SELECT
person.idFamily AS Family_ID,
person.id AS Person_ID,
(SELECT person.firstName+ ', ' + person.lastName) AS Name,
person.Race AS Race,
person.Ethnicity AS Ethnicity,
family.capidCounty AS County,
person.birthDate AS BirthDate,
DATEDIFF(year,person.birthDate,getdate()) as Age
FROM Family
LEFT JOIN person ON family.Id = person.idFamily
Related
I am pretty new to using SQL (using StandardSQL via Big Query currently) and unfortunately my Google-fu could not find me a solution to this issue.
I'm working with a dataset where each row is a different person and each column is an attribute (name, age, gender, weight, ethnicity, height, bmi, education level, GPA, etc.). I am tying to 'cluster' these people into all of the feature combinations that match 5 or more people.
Originally I did this manually with 3 feature columns where I would essentially concatenate a 'cluster name' column and then have 7 select queries for each grouping with a >5 where clause, which I then UNIONed together:
gender
age
ethnicity
gender + age
gender + ethnicity
age + ethnicity
gender + age + ethnicity
^ unfortunately doing it this way just balloons the number of combinations and with my anticipated ~15 total features doing it this way seems really unfeasible. I'd also like to do this through a less manual approach so that if a new feature is added in the future it does not require major edits to include it in my cluster identification.
Is there a function or existing process that could accomplish something like this? I'd ideally like to be able to identify ALL combinations that meet my combination user count minimum (so it's expected the same rows would match multiple different clusters here. Any advice or help here would be appreciated! Thanks.
If only BQ supported grouping sets or cube, this would be simple. One method that is pretty generalizable enumerates the 7 groups and then uses bits to figure out what to aggregate:
select (case when n & 1 > 0 then gender end) as gender,
(case when n & 2 > 0 then age end) as age,
(case when n & 4 > 0 then ethnicity end) as ethnicity,
count(*)
from t cross join
unnest(generate_array(1, 7)) n
group by n, 1, 2, 3;
Another method which is trickier is to reconstruct the groups using rollup(). Something like this:
select gender, age, ethnicity, count(*)
from t
group by rollup(gender, age, ethnicity);
Produces three of the groups you want. So:
select gender, age, ethnicity, count(*)
from t
group by rollup(gender, age, ethnicity)
union all
select gender, null, ethnicity, count(*)
from t
group by gender, ethnicity
union all
select null, age, ethnicity, count(*)
from t
group by rollup (ethnicity, age);
The above reconstructs all your groups using rollup().
I'm using MS Access for the following task (due to office restrictions). I'm quite new to SQL.
I have the following table:
I want to select all stores grouped by street, zip and place. But i only want to group them, if the SquareSum (after Group by) is < 1000. Rue de gare 2 should be grouped, while Bahnhofstrasse 23 should be seperate lines.
So far as i know MS Access doesn't allow a case statement. So my query looks like this:
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare,
FROM Table1
SWITCH (SumSquare > 1000, GROUP BY (Street, ZIP, Place))
I also tried:
GROUP BY
SWITCH (SumSquare > 1000, (Street, ZIP, Place))
But it keeps telling me i have a syntax error. Could someone please help me?
In Access, I would do this with several queries.
This would be easier to do if you had an id on the rows (such as an autonumber).
First query identifies the streets that should be summed.
query: SumTheseStreets
SELECT
Street,
ZIP,
Place,
Sum(Square) AS SumSquare
FROM Table1
GROUP BY Street, ZIP, Place
HAVING sum(Square) < 1000
Note the HAVING which is a bit like a WHERE clause that's applied outside of the GROUP BY or SUM
Second query identifies the other rows (notes on this one below):
query: StreetsNotSummed
SELECT
Street,
ZIP,
Place,
Square AS SumSquare
FROM Table1
LEFT JOIN SumTheseStreets ON Table1.Street = SumTheseStreets.Street AND Table1.ZIP = SUmTheseStreets.ZIP AND Table1.Place = SumTheseStreets.Place
WHERE SumTheseStreets.Street IS NULL;
A couple of notes:
I've called the field SumSquare because I want it to be the same name as the SumSquare field in the first query
It uses the first query as one of the input "tables"
This uses a LEFT JOIN which means "give me all of the rows in the first table (table1) and if any rows in the second table (SumTheseStreets) match, put those in as well.
but then it filters out the rows that DO match.
So this query only lists the streets that you want NOT summed.
So now you need a third query.
This simply includes all of the rows in both of those queries.
I'm not too sure on the Access syntax on this one, but there's a union query wizard if this isn't right.
Query: TheAnswerRequired
SELECT
Street,
ZIP,
Place,
SumSquare
FROM SumTheseStreets
UNION
SELECT
Street,
ZIP,
Place,
SumSquare
FROM StreetsNotSummed
(it might need to be UNION ALL)
Good luck.
You can use UNION ALL:
SELECT ts.*
FROM (SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
WHERE ts.SumSquare < 1000
UNION ALL
SELECT t1.*
FROM Table1 as t1 INNER JOIN
(SELECT Street, Zip, Place, SUM(Square) as SumSquare
FROM Table1
GROUP BY Street, Zip, Place
) as ts
ON t1.Street = ts.Street AND t1.Zip = ts.Zip and t1.Place = ts.Place
WHERE ts.SumSquare >= 1000
I am doing a databases course and I have a question that I don't seem to be able to get the answer right to.
There are 3 tables:
country(code, iso_abbreviation, name)
area(name, city, country_code, latitude, longitude, elevation)
attraction(name, type, city, country_name, latitude, longitude, elevation)
Now, the question asks this: areas are found in both the attraction and area tables. List
(country_abbreviation, area_name, latitude, longitude, elevation)
for all the areas above 5000 feet elevation. As there may be some inconsistency between the area and attraction data, latitude, longitude and elevation might differ. In such cases, display both variants of the data.
So I came up with the query below, but I'm not sure it pairs them up correctly and it also doesn't split the data into two rows where one of the (latitude, longitude, elevation) elements is different.
SELECT country.iso_abbreviation as country_abbreviation, area.name as name,
area.latitude, area.longitude, area.elevation
FROM area JOIN country on country.code = area.country_code
JOIN attraction on area.name = attraction.name
WHERE area.elevation > 10000
UNION
SELECT DISTINCT country.iso_abbreviation as country_abbreviation, area.name,
attraction.latitude, attraction.longitude, attraction.elevation
FROM area JOIN country on country.code = area.state_code
JOIN attraction on area.name = attraction.name
WHERE attraction.elevation > 10000 ORDER BY country_abbreviation
;
Could someone please help me out with this?
This would do what you describe:
WITH cte AS (
SELECT c.iso_abbreviation AS country_abbreviation
, a.name, a.latitude, a.longitude, a.elevation
FROM area a
JOIN country c ON c.code = a.country_code
WHERE a.elevation > 5000
)
SELECT * FROM cte
UNION
SELECT c.country_abbreviation
, t.name, t.latitude, t.longitude, t.elevation
FROM cte c
JOIN attraction t USING (name) -- assuming name links area & attraction (?)
ORDER BY country_abbreviation, name -- (?)
But honestly, the table layout as well as the task you have been given seem unclear.
Using a common table expression to reuse results from first query.
UNION (as opposed to UNION ALL) removes full duplicates automatically
Let's say I have a database of Amazon customers who made purchases in the last year. It is pretty detailed and has columns like name, age, zip code, income level, favorite color, food, music, etc. Now, let's say I run a query such that I return all Amazon customers who bought Book X.
SELECT NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE BOOK = "X"
This query will return a bunch of customers who bought Book X. Now, I want to iterate through each of those results (iterate through each customer) and create a query based on each customer's individual age, zipcode, and income.
So if the first result is Bob, age 32, lives in zipcode 90210, makes $45,000 annually, create a query to find all others like Bob who share the same age, zipcode, and income. If the second result is Mary, age 41, lives in zipcode 10004, makes $55,000 annually, create a query to find all others like Mary who share the same age, zipcode, and income.
How do I iterate through customers who bought Book X and run multiple queries whose values (age, zipcode, income) are changing? In terms of viewing the results, it'd be great if I could see Bob, followed by all customers who are like Bob, then Mary, and all customers who are like Mary.
Is this even possible in SQL? I know how to do this in C# (for/next loops with if/then statements inside) but am new to SQL, and the data is in SQL.
I use SQL Server 2008.
If i understood your requirement correctly then a nested quesry should do the job. SOmething like this:
SELECT distinct NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers] a, (SELECT NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE BOOK = "X" and name = 'Bob') b
WHERE BOOK = "X" and a.age=b.age and a.zipcode= b.zipcode and a.income=b.income
EDIT: A generic query will be [This will have list of all users]:
SELECT distinct NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers] a, (SELECT distinct NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE BOOK = "X" ) b
WHERE a.BOOK = b.book and a.age=b.age and a.zipcode= b.zipcode and a.income=b.income
order by name
Something like this can do it in one query:
;WITH cteSource as
(
SELECT NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE BOOK = "X"
)
SELECT sr.NAME AS SrcName, cu.NAME AS LikeName
FROM [Amazon].[dbo].[Customers] AS cu
JOIN cteSource As sr
ON cu.AGE = sr.AGE
And cu.ZIPCODE = sr.ZIPCODE
And cu.INCOME = sr.INCOME
Something like this will let you chase related customers to an arbitrary, e.g. 5 here, degree of separation. By constructing the JOINs correctly you can do things like match income within a range, ... .
with Book as (
select Id, Name, Age, ZIPCode, Income -- ...
from Amazon.dbo.Customers
where Book = 'X' ),
RelatedCustomers as (
select C.Id, C.Name, C.Age, C.ZIPCode, C.Income, 1 as Depth -- ...
from Amazon.dbo.Customers as C inner join
Book as B on B.Id <> C.Id and Abs( B.Income - C.Income ) < 2000 -- and ...
union all
select C.Id, C.Name, C.Age, C.ZIPCode, C.Income, RC.Depth + 1-- ...
from Amazon.dbo.Customers as C inner join
RelatedCustomers as RC on RC.Id <> C.Id and Abs( RC.Income - C.Income ) < 2000 -- and ...
where Depth < 5 )
select *
from RelatedCustomers
I think you need two separate queries. First one to bring back the customers, once a customer such as Bob is selected a second query is performed based on Bob's attributes.
A simple example would be a forms application that has two grids. The first displays a list of the users. When you select one of the users the second grid is populated with the results of the second query.
The second query would be something like:
SELECT NAME, AGE, ZIPCODE, INCOME, FAVECOLOR, FAVEFOOD, FAVEMUSIC
FROM [Amazon].[dbo].[Customers]
WHERE Age = #BobsAge AND ZipCode = #BobsZipCode AND Income = #BobsIncome
It sounds like you want a simple self-join:
SELECT
MatchingCustomers.NAME,
MatchingCustomers.AGE,
MatchingCustomers.ZIPCODE,
MatchingCustomers.INCOME,
MatchingCustomers.FAVECOLOR,
MatchingCustomers.FAVEFOOD,
MatchingCustomers.FAVEMUSIC
FROM
[Amazon].[dbo].[Customers] SourceCustomer
LEFT JOIN [Amazon].[dbo].[Customers] MatchingCustomers
ON SourceCustomer.Age = MatchingCustomer.Age
AND SourceCustomer.ZipCode = MatchingCustomer.ZipCode
AND SourceCustomer.Income = MatchingCustomer.Income
WHERE
SourceCustomer.Book = 'X'
If you want to see the all source customers and all of their matches in a single result set you can remove the where clause and select data SourceCustomer also:
SELECT
SourceCustomer.Name SourceName,
SourceCustomer.Age SourceAge
SourceCustomer.ZipCode SourceZipCode,
SourceCustomer.Income SourceIncome,
MatchingCustomers.NAME,
MatchingCustomers.AGE,
MatchingCustomers.ZIPCODE,
MatchingCustomers.INCOME,
MatchingCustomers.FAVECOLOR,
MatchingCustomers.FAVEFOOD,
MatchingCustomers.FAVEMUSIC
FROM
[Amazon].[dbo].[Customers] SourceCustomer
LEFT JOIN [Amazon].[dbo].[Customers] MatchingCustomers
ON SourceCustomer.Age = MatchingCustomer.Age
AND SourceCustomer.ZipCode = MatchingCustomer.ZipCode
AND SourceCustomer.Income = MatchingCustomer.Income
WHERE
SourceCustomer.Book = 'X'
I have got a table of orders placed by customer , what i want is to check from which part of the country orders are coming historically, I can only check this by postcodes , for intance an order with post code SK... means its stockport , similarly the post code starting from M .. means the order is from manchester, Is it possible to write a query which can count the orders by postcode.
Some of the fields of the Order table:
OrderNumber OGUID custID firstname last name address postcode email authorisation date etc...
Any suggestion or assistance will be appreciated.
Thanks
Here is way that works... but it can get too long for a huge list. I will try to find a way around that problem.
SELECT
CASE
WHEN postcode LIKE 'SK%' THEN 'SK'
WHEN postcode LIKE 'M%' THEN 'M'
END AS group_by_value
, COUNT(*) AS group_by_count
FROM [Table] a
GROUP BY
CASE
WHEN postcode LIKE 'SK%' THEN 'SK'
WHEN postcode LIKE 'M%' THEN 'M'
END
If you have a table that contains the city code and city name, then you might be able to use something like the following which joins your orders table to the codes using a LIKE:
select o.postcode,
c.city,
count(c.code) over(partition by c.code) Total
from orders o
inner join codes c
on o.postcode like '%'+c.code+'%'
See SQL Fiddle with Demo
You can use GROUP BY to get the total number of orders in each postcode:
select postcode, count(postcode) TotalOrdersByPostCode
from orders
group by postcode
If you want the City included, then you can also GROUP BY city:
select city, postcode, count(postcode) TotalOrdersByPostCode
from orders
group by city, postcode
select count(1) over(partition by postcode) as countByPostcode, othecolumnhere
from Order
Have you tried something like this? The town part of the postcode will be the first 1 or 2 bytes, delimited by a number after, I think. So this will give you the first few letters.
select substring(postcode,1, patindex('%[0-9]%',postcode)-1), count(*)
from Order
group by substring(postcode,1, patindex('%[0-9]%',postcode)-1)
Then you'll have to decode M into Manchester, W into West London, GU into Guildford etc...