PostgreSQL: Self-referencing, flattening join to table which contains tree of objects - sql

I have a relatively large (as in >10^6 entries) table called "things" which represent locateable objects, e.g. countries, areas, cities, streets, etc. They are used as a tree of objects with a fixed depth, so the table structure looks like this:
id
name
type
continent_id
country_id
city_id
area_id
street_id
etc.
The association inside "things" is 1:n, i.e. a street or area always belongs to a defined city and country (not two or none); the column city_id for example contains the id of the "city" thing for all the objects which are inside that city. The "type" column contains the type of thing (Street, City, etc) as a string.
This table is referenced in another table "actions" as "thing_id". I am trying to generate a table of action location statistics showing the number of active and inactive actions a given location has. A simple JOIN like
SELECT count(nullif(actions.active, 1)) AS icount,
count(nullif(actions.active, 0)) AS acount,
things.name AS name, things.id AS thing_id, things.city_id AS city_id
FROM "actions"
LEFT JOIN things ON actions.thing_id = things.id
WHERE UPPER(substring(things.name, 1, 1)) = UPPER('A')
AND actions.datetime_at BETWEEN '2012-09-26 19:52:14' AND '2012-10-26 22:00:00'
GROUP BY things.name, things.id ORDER BY things.name
will give me a list of "things" (starting with 'A') which have actions associated with them and their active and inactive count like this:
icount | acount | name | thing_id | city_id
------------------------------------------------------------------
0 5 Brooklyn, New York City | 25 | 23
1 0 Manhattan, New York City | 24 | 23
3 2 New York City | 23 | 23
Now I would like to
only consider "city" things (that's easy: filter by type in "things"), and
in the active/inactive counts, use the sum of all actions happening in this city - regardless of whether the action is associated with the city itself or something inside the city (= having the same city_id). With the same dataset as above, the new query should result in
icount | acount | name | thing_id | city_id
------------------------------------------------------------------
4 7 New York City | 23 | 23
I do not need the thing_id in this table (since it would not be unique anyway), but since I do need the city's name (for display), it is probably just as easy to also output the ID, then I don't have to change as much in my code.
How would I have to modify the above query to achieve this? I'd like to avoid additional trips to the database, and advanced SQL features such as procedures, triggers, views and temporary tables, if possible.
I'm using Postgres 8.3 with Ruby 1.9.3 on Rails 3.0.14 (on Mac OS X 10.7.4).
Thank you! :)

You need to count actions for all things in the city in an independent subquery and then join to a limited set of things:
SELECT c.icount
,c.acount
,t.name
,t.id AS thing_id
,t.city_id
FROM (
SELECT t.city_id
,count(nullif(a.active, 1)) AS icount
,sum(a.active) AS acount
FROM things t
LEFT JOIN actions a ON a.thing_id = t.id
WHERE t.city_id = 23 -- to restrict results to one city
GROUP BY t.city_id
) c -- counts per city
JOIN things t USING (city_id)
WHERE t.name ILIKE 'A%'
AND t.datetime_at BETWEEN '2012-09-26 19:52:14'
AND '2012-10-26 22:00:00'
ORDER BY t.name, t.id;
I also simplified a number of other things in your query and used table aliases to make it easier to read.

Related

SQL row names and row flags

I have trouble understanding row flags. The below question can clear it for me:
Is it possible to store a name and its flag in the same cell in SQL?
Consider:
If you have a table known as cars with the columns number_plate, colour, and brand_name. The brand_name has a name and a flag.
How would one store that in a single column? If it is not possible or advised, explain why and how to do it.
How would you then get the number of cars from a given country (based on the unique number_plate(primary key)) and the country flag?
I think you are trying to design a schema but haven't quite got the hang of foreign keys.
In your example, you'd have the following tables:
country:
country_id name continent
-----------------------------------
1 Germany Europe
2 Japan Asia
3 USA N.America
Brand
Brand_id name country_id (foreign key)
---------------------------------------------
1 Mercedes 1
2 Toyota 2
3 BMW 1
4 Chrysler 3
Car
Number_plate colour brand_id
------------------------------------------
xxx-yy-zz Green 1
aa-bb-cc Red 1
kkk-l-mmm Orange 2
....
To find the number of cars, based on the country where the brand is based, you'd do something like:
select country.name,
count(*)
from car
inner join brand on car.brand_id = brand.brand_id
inner join country on brand.country_id = country.country_id
group by country.name
Let's say name and flag are two separate columns. Using concat function they can be stored into a single column named brand_name.
select number_plate, colour, concat(name,' ',flag) as brand_name from cars
To get the count of cars(unique) based on a flag
select * from
(select
distinct number_plate,
colour,
concat(name,' ',flag) as brand_name from cars
) a
where brand_name like '%UK%'
Demo

How to do an exact match followed by ORDER BY in PostgreSQL

I'm trying to write a query that puts some results (in my case a single result) at the top, and then sorts the rest. I have yet to find a PostgreSQL solution.
Say I have a table called airports like so.
id | code | display_name
----+------+----------------------------
1 | SDF | International
2 | INT | International Airport
3 | TES | Test
4 | APP | Airport Place International
In short, I have a query in a controller method that gets called asynchronously when a user text searches for an airport either by code or display_name. However, when a user types in an input that matches a code exactly (airport code is unique), I want that result to appear first, and all airports that also have int in their display_name to be displayed afterwards in ascending order. If there is no exact match, it should return any wildcard matches sorted by display_name ascending. So if a user types in INT, The row (2, INT, International Airport) should be returned first followed by the others:
Results:
1. INT | International Airport
2. APP | Airport Place International
3. SDF | International
Here's the kind of query I was tinkering with that is slightly simplified to make sense outside the context of my application but same concept nonetheless.
SELECT * FROM airports
WHERE display_name LIKE 'somesearchtext%'
ORDER BY (CASE WHEN a.code = 'somesearchtext` THEN a.code ELSE a.display_name END)
Right now the results if I type INT I'm getting
Results:
1. APP | Airport Place International
2. INT | International Airport
3. SDF | International
My ORDER BY must be incorrect but I can't seem to get it
Any help would be greatly appreciated :)
If you want an exact match on code to return first, then I think this does the trick:
SELECT a.*
FROM airports a
WHERE a.display_name LIKE 'somesearchtext%'
ORDER BY (CASE WHEN a.code = 'somesearchtext' THEN 1 ELSE 2 END),
a.display_name
You could also write this as:
ORDER BY (a.code = 'somesearchtext') DESC, a.display_name
This isn't standard SQL, but it is quite readable.
I think you can achieve your goal by using a UNION.
First get an exact match and then add that result to rest of the data as you which.
e.g.. (you will need to work in this a bit)
SELECT * FROM airports
WHERE code == 'somesearchtext'
ORDER BY display_name
UNION
SELECT * FROM airports
WHERE code != 'somesearchtext' AND display_name LIKE 'somesearchtext%'
ORDER BY display_name

Selecting more after group-by while using join

At the moment I am busy with two tables, Students and Classes. These two both contain a column project_group, a way to categorize multiple students from one class into smaller groups.
In the Students table there is a column City that states in which town/city students live, from the rows that have been filled there are already several cities occurring multiple times. The code I used to check how many times a city is being showed is this:
SELECT City, count(*)
FROM Students
GROUP BY City
Now the next thing I want to do is show per class in which cities the students live and how many live there, so for example a result like:
A | - | 2
A | New York | 3
A | Los Angeles | 1
B | - | 1
B | Miami | 2
B | Seattle | 1
Students and Classes can join each other on the column project_group but what I'm mostly interested in his using both the GROUP BY mentioned earlier, using the JOIN and also showing the results per class.
Thanks in advance,
KRAD
I'm not sure what the column name is for A and B in your example. I'm assuming Classes.Class in the following:
SELECT
C.Class
, S.City
, COUNT(S.*) AS Count
FROM
Classes AS C INNER JOIN
Students AS S ON C.Project_Group = S.Project_Group
GROUP BY
C.Class
, S.City
I managed to get it working. While doing some tests to see which exact error message it was that I got, I used this and managed to get it working. I now get an overview per class that shows how many people live in which city. This is the code used.
SELECT class_id, city, count(*) AS amount
FROM students, classes
WHERE students.project_group = classes.project_group
GROUP BY class_id, city
ORDER BY class_id

statistic syntax in access

I want to do some statistic for the Point in my appliation,this is the columns for Point table:
id type city
1 food NewYork
2 food Washington
3 sport NewYork
4 food .....
Each point belongs to a certain type and located at the certain city.
Now I want to caculate the numbers of points in different city for each type.
For example, there are two types here :food and sport.
Then I want to know:
how many points of `food` and `sport` at NewYork
how many points of `food` and `sport` at Washington
how many points of `food` and `sport` at Chicago
......
I have tried this:
select type,count(*) as num from point group by type ;
But I can not group the by the city.
How to make it?
Update
id type city
1 food NewYork
2 sport NewYork
3 food Chicago
4 food San
And I want to get something like this:
NewYork Chicago San
food 2 1 1
sport 1 0 0
I will use the html table and chart to display these datas.
So I need to do the counting, I can use something like this:
select count(*) from point where type='food' and city ='San'
select count(*) from point where type='food' and city ='NewYork'
....
However I think this is a bad idea,so I wonder if I can use the sql to do the counting.
BTW,for these table data,how do people organization their structure using json?
this's what you want:
SELECT city,
COUNT(CASE WHEN [type] = 'food' THEN 1 END) AS FoodCount,
COUNT(CASE WHEN [type] = 'sport' THEN 1 END) AS SportCount
FROM point
GROUP BY city
UPDATE:
To get the results in an aggregated row/column format you need to use a pivot table. In Access it's called a Crosstab query. You can use the Crosstab query wizard to generate the query via a nice UI or cut straight to the SQL:
TRANSFORM COUNT(id) AS CountOfId
SELECT type
FROM point
GROUP BY type
PIVOT city
The grouping is used to count the number of Id's for each type. The additional PIVOT clause groups the data by city and displays each grouping in a separate column. The end result looks something like this:
NewYork Chicago San
food 2 1 1
sport 1 0 0

Creating new table from data of other tables

I'm very new to SQL and I hope someone can help me with some SQL syntax. I have a database with these tables and fields,
DATA: data_id, person_id, attribute_id, date, value
PERSONS: person_id, parent_id, name
ATTRIBUTES: attribute_id, attribute_type
attribute_type can be "Height" or "Weight"
Question 1
Give a person's "Name", I would like to return a table of "Weight" measurements for each children. Ie: if John has 3 children names Alice, Bob and Carol, then I want a table like this
| date | Alice | Bob | Carol |
I know how to get a long list of children's weights like this:
select d.date,
d.value
from data d,
persons child,
persons parent,
attributes a
where parent.name='John'
and child.parent_id = parent.person_id
and d.attribute_id = a.attribute_id
and a.attribute_type = "Weight';
but I don't know how to create a new table that looks like:
| date | Child 1 name | Child 2 name | ... | Child N name |
Question 2
Also, I would like to select the attributes to be between a certain range.
Question 3
What happens if the dates are not consistent across the children? For example, suppose Alice is 3 years older than Bob, then there's no data for Bob during the first 3 years of Alice's life. How does the database handle this if we request all the data?
1) It might not be so easy. MS SQL Server can PIVOT a table on an axis, but dumping the resultset to an array and sorting there (assuming this is tied to some sort of program) might be the simpler way right now if you're new to SQL.
If you can manage to do it in SQL it still won't be enough info to create a new table, just return the data you'd use to fill it in, so some sort of external manipulation will probably be required. But you can probably just use INSERT INTO [new table] SELECT [...] to fill that new table from your select query, at least.
2) You can join on attributes for each unique attribute:
SELECT [...] FROM data AS d
JOIN persons AS p ON d.person_id = p.person_id
JOIN attributes AS weight ON p.attribute_id = weight.attribute_id
HAVING weight.attribute_type = 'Weight'
JOIN attributes AS height ON p.attribute_id = height.attribute_id
HAVING height.attribute_type = 'Height'
[...]
(The way you're joining in the original query is just shorthand for [INNER] JOIN .. ON, same thing except you'll need the HAVING clause in there)
3) It depends on the type of JOIN you use to match parent/child relationships, and any dates you're filtering on in the WHERE, if I'm reading that right (entirely possible I'm not). I'm not sure quite what you're looking for, or what kind of database you're using, so no good answer. If you're new enough to SQL that you don't know the different kinds of JOINs and what they can do, it's very worthwhile to learn them - they put the R in RDBMS.
when you do a select, you need to specify the exact columns you want. In other words you can't return the Nth child's name. Ie this isn't possible:
1/2/2010 | Child_1_name | Child_2_name | Child_3_name
1/3/2010 | Child_1_name
1/4/2010 | Child_1_name | Child_2_name
Each record needs to have the same amount of columns. So you might be able to make a select that does this:
1/2/2010 | Child_1_name
1/2/2010 | Child_2_name
1/2/2010 | Child_3_name
1/3/2010 | Child_1_name
1/4/2010 | Child_1_name
1/4/2010 | Child_2_name
And then in a report remap it to how you want it displayed