SQL: Tree structure without parent key - sql

Note: The Data schema can not be changed. I'm stuck with it.
Database: SQLite
I have a simple tree structure, without parent keys, that is only 1 level deep. I have simplied the data for clarity:
ID Content Title
1 Null Canada
2 25 Toronto
3 33 Vancouver
4 Null USA
5 45 New York
6 56 Dallas
The structure is ordinal as well so all Canadian Cities are > Canada's ID of 1 and less than the USA's ID of 4
Question: How do I select all a nation's Cities when I do not know how many there are?

My query assigns every city to every country, which is probably not what you want, but:
http://sqlfiddle.com/#!5/94d63/3
SELECT *
FROM (
SELECT
place.Title AS country_name,
place.ID AS id,
(SELECT MIN(ID)
FROM place AS next_place
WHERE next_place.ID > place.ID
AND next_place.Content IS NULL
) AS next_id
FROM place
WHERE place.Content IS NULL
) AS country
INNER JOIN place
ON place.ID > country.id
AND CASE WHEN country.next_id IS NOT NULL
THEN place.ID < country.next_id
ELSE 1 END

select * from tbl
where id > 1
and id < (select min(id) from tbl where content is null and id > 1)
EDIT
I just realized the above does not work if there are no countries with greater ID. This should fix it.
select * from tbl a
where id > 4
and id < (select coalesce(b.id,a.id+1) from tbl b where b.content is null and b.id > a.id)
Edit 2 - Also made subquery fully correlated, so only have to change country id in one place.

You have here severals things to consider, one is if your data is gonna change and the other one is if it isn't gonna change, for the first one exist 2 solutions, and for the second, just one.
If your data is organize as shown in your example, you can do a select top 3, i.e.
SELECT * FROM CITIES WHERE ID NOT IN (SELECT TOP 3 ID FROM CITIES)
You can create another table where you specify wich city belongs to what parent, and make the hierarchy by yourself.
I reccomend the second one to be used.

Related

Group data based on the mode in sql

I have a large dataset in Microsoft Access with postcodes in one column and an unordered category in the next as follows:
Postcode | Class
--------------------------
1111AA | A
1111AA | B
1111AA | A
1111AB | C
1111AB | C
1111AB | A
I would like to group the data such that on the left-hand side I have one Postcode for the mode of the Class on the right. The classes are unordered (i.e: A is not better than B, nor C better than B). I have tried using queries but they only really work for numerical data and I can only seem to use these techniques for finding averages.
So in the end I want:
Postcode | Class
------------------------
1111AA | A
1111AB | C
via most frequency per postal code, you would need a pre-query first and then match that count... basically doing a triple-process against the table.
The third level in alias (QPerClass) gets on a per postal code, each class and their respective counts. A max classification count of 3 in one postal code vs count 5 in another are two separate things, and you don't want the overall most popular 5 count if such other postal code only has 3, it would never find a corresponding match.
So, per your sample data, this would result in
PostCode Class Count
1111AA A 2
1111AA B 1
1111AB C 2
1111AB A 1
From that, we need each postal code's MAXIMUM COUNT, but you can't grab the class associated with it as which you can't say grab the class with the max count, and you cant do a limit 1 per postal code.. this would result in
1111AA 2
1111AB 2
Now that you have the count per postal code, join that to your original table and apply a group by AND a having so that the outer-level HAVING COUNT(*) matches the MAX() count as determined in the second step.
select
pc.postCode,
pc.class,
MaxPostByCode.MaxCount
from
PostalCodes pc
JOIN ( select QPerClass.postCode,
max( QPerClass.perClassCount ) MaxCount
from
( select pc2.postcode,
pc2.class,
count(*) perclassCount
from
postalcodes pc2
group by
pc2.postcode,
pc2.class ) QPerClass
group by
QPerClass.postCode ) MaxByPostCode
on pc.postcode = MaxByPostCode.postCode
group by
pc.postCode,
pc.class,
MaxPostByCode.MaxCount
having
count(*) = MaxPostByCode.MaxCount
Now, if you have an instance where there are multiple entries that have the same MODE (max count per class), then you would have to wrap it up yet again to get the MIN( CLASS ) that qualified the HAVING clause grouped by postal code, such as
select
m.postcode,
min( m.class )
from
( entire query above with the HAVING clause ) m
group by
m.postcode
This is what you want...
SELECT TOP 1 WITH TIES PostCode, Class
FROM #Temp
GROUP BY PostCode, Class
ORDER BY COUNT(*) DESC
Use a correlated subquery to get the most often occurring class for a postcode:
select postcode, class
from mytable as m
where class =
(
select top 1 m2.class
from mytable m2
where m2.postcode = m.postcode
group by m2.class
order by count(*) desc
)
group by m.postcode, m.class;
In case of a tie, one class is picked arbitrarily.

Select MAX Value from pivot

I am working on an application where the user enters their name, age, state of residence. We want to capture the state to use for various things. Very straight forward when only one name, age, state of residence is listed. In some cases the user may list more than one name, age, state of residence. All of the data is stored in one table called Table. There is an ID, Name, and Value Column. I am attempting to pivot the data and then select the State of residence I need. There are two scenarios I need to account for.
-There are multiple names, ages, states of residence entered on the application. The state of residence we want is the one with the highest age. i.e.
Name Age State of Residence
John 25 FL
Bill 31 AL
Sue 26 MS
In scenario 1 I want to return AL as the State of Residence
-There are multiple names, ages, states of residence entered on the application. If the ages are the same in multiple states of residence, return the state that falls first alphabetically.
Name Age State of Residence
John 25 FL
Will 25 CA
Sue 26 MS
In scenario 2 I want to return CA as the State of Residence
I have tried:
Select State, Age
FROM (
Select * from Table
where ID = #ID and Name IN (State,Age)
) as s
PIVOT
(
MAX(value) FOR Name IN (State,Age)
) as pvt
This returns the data, I just lack the knowledge on how to get the rest of the way. I tried adding group by but it would not work with the PIVOT.
I also tried it this way:
DECLARE #State AS NVARCHAR(2)
SELECT
[State],[ [Acres]
INTO #tmpTable FROM Table
PIVOT
(
MAX(Value)
FOR [Name]
IN ([State],[Acres])AS p
WHERE ID = #ID
--
SELECT * FROM #tmpTable
--
DROP TABLE #tmpTable
Edited/Updated information:
I have one table called dbo.Table with ID,NAME,VALUE,INSTANCE columns. In the first scenario, the table looks like this after the user enters two different records:
ID NAME VALUE INSTANCE
1000 FIRST NAME JOHN 1
1000 AGE 25 1
1000 STATE AZ 1
1000 FIRST NAME BILL 2
1000 AGE 27 2
1000 STATE NH 2
I want to return NH as the State of Residence becuase it has the higest age.
In the second scenario the table looks like:
ID NAME VALUE INSTANCE
1000 FIRST NAME JOHN 1
1000 AGE 25 1
1000 STATE AZ 1
1000 FIRST NAME BILL 2
1000 AGE 25 2
1000 STATE NH 2
I want to return AZ as the State of Residence because it falls first alphabetically
I am accounting for both scenarios. Neither is guaranteed to occur. The user may only enter one record. Either way a State of Residence is always returned. I hope that makes more sense. I attempted to pivot the data as you see it above, but do not understand how to get the state i need. Thanks.
Edited
Ok. I got the point. You're suffering from a bad designed table. It is not at least in 1FN where each table column (attribute) cannot store more than a single value from a single domain.
In your case you have a widespread table, accepting any kind of attribute you want. The best option to you is fix your columns in a common 1FN table, like:
CREATE TABLE dbo.table (
instance INTEGER ... ,
first_name VARCHAR... ,
age integer... ,
state CHAR(2)...
)
Anyways, if you're working with legacy applications that you cannot modify, try this out:
-- Create a database VIEW to simulate a 1FN table
CREATE VIEW dbo.view1 AS
SELECT DISTINCT
t.INSTANCE,
(
SELECT
t2.value
FROM
dbo.table t2
WHERE
t2.id = t.id
AND t2.instance = t.instance
AND t2.name = 'FIRST NAME'
) AS FIRST_NAME,
(
SELECT
t2.value
FROM
dbo.table t2
WHERE
t2.id = t.id
AND t2.instance = t.instance
AND t2.name = 'AGE'
) AS AGE,
(
SELECT
t2.value
FROM
dbo.table t2
WHERE
t2.id = t.id
AND t2.instance = t.instance
AND t2.name = 'STATE'
) AS STATE
FROM
dbo.table t
Then...
Question 1 - although all fields are selected you can fetch only state if you want
SELECT
*
FROM
dbo.view1 v
WHERE
v.age = (
SELECT
MAX(v2.age)
FROM
dbo.view1 v2
)
Question 2 - First state according to alphabetical order ascending
SELECT
MIN(v.state) AS state
FROM
dbo.view1 v
WHERE
-- you can define a WHERE condition to meet your requirements here
v.age = 25
If you like it, please upvote this answer or mark it as acceptable :)

Select rows with similar tags (many-to-many relation) from SQL database

There is my entity table and tags bound to it through another table (many-to-many relation).
I have an entity selected and my goal is to find a set of entities which have as much similar tags with it as possible. This set must be ordered by 'similarity' – amount of tags each entity shares with chosen entity. Tags are similar if they have the same ids.
I'm wondering if there's elegant and fast way to do that with single query.
The only solution I see now is to fetch all the tag-entity relations and compute their similarity in my application, and then make another database query to select what i've computed, but it doesn't look very graceful.
Database structure:
entity
id
...
tag
id
name
entity_tag
entity_id
tag_id
Update: final solution for MySQL.
So I have tables for paintings, tags and painting_tag relation. This query fetches similar paintings and their 'similarity index' for previously selected painting.
SELECT site_painting.*, Count(tr.tag_id) as similarity
From site_painting_tag_relation as tr
Inner Join site_painting_tag_relation as tr2 ON ( tr2.tag_id = tr.tag_id and tr2.painting_id = :id )
Left join site_painting on site_painting.id=tr.painting_id
Where tr.painting_id <> :id
Group By tr.painting_id
Having Count(*) > 0
Order By Count(*) DESC, tr.painting_id limit 1
OK, The info on db structure helps a lot --
entity id ...
entity_tag entity_id tag_id
tag id name
Lets look at some example values--
entity
id=100
id=...
id=199
entity_tag
100, 3
100, 5
101, 1
102, 7
...
199, 3
199, 7
tag
id=1
id=...
id=10
So if we denormalize the entity_tag we have
100 3,5
101 1
102 7
199 3 7
And a similarity index of 199 to the rest, on a scale of 0 to 10
100 1 in common
101 0 in common
102 1 in common
199 self, no comparison
And if I have it right, we want to display 100 and 102 being the highest,
nu??
Here is a stab at the SQL--
It might be something like ---
SELECT TOP 10
FROM
(SELECT
allET.EID,
Count(*) as Similarity
From entity_tag as allET
Left Join
(Select * From entity_tag Where EID = myEID ) as myET
On allET.TID = myET.TID
Where allET.EID <> myEID
Group By allET.EID
Having Count(*) > 0
Order By Count(*) DESC, allET.EID
)

Keep one instance of duplicate appearing in one of two columns

I've got a table containing one column with unique ID and one column with each unique ID's spouse ID (if they have a spouse). The problem is that each spouse ID also appears in the unique ID column, so when I pull a list, attempting to treat a couple as a single unit, I'm often doublecounting for a single couple.
What's a good, efficient way of taking a given list of unique IDs, checking to see if their spouse is also in the same list of unique IDs, and returning only one unique ID per couple?
The issue is a little more complicated in that sometimes both spouses are not included in the same list, so it's not simply a matter of keeping one person if they're married. In the event that the spouse isn't also in the same list, I want to make sure to retain the one that is. I also want to make sure I'm retaining all people who have a NULL value in the spouse ID column.
Subset of table in question:
Unique_ID Spouse_ID
1 2
2 1
3 NULL
4 NULL
5 10
6 25
7 NULL
8 9
9 8
10 5
In this excerpt, ID's 3, 4, and 7 are all single. ID's 1, 2, 5, 8, and 9 have spouses that appear in the Unique_ID column. ID 6 has a spouse whose ID does not appear in the Unique_ID column. So, I'd want to keep ID's 1 (or 2), 3, 4, 5 (or 10), 6, 7, and 8 (or 9). Hope that makes sense.
My inclination would be to combine the two lists and remove duplicates:
select distinct id
from ((select id
from t
) union all
(select spouse_id
from t
where spouse_id in (select id from t)
)
) t
But, your question asked for an efficient way. Another way to think about this is to add a new column which is the spouse id if in the id list or NULL otherwise (this uses a left outer join. Then there are three cases:
There is no spouse id, so use the id
The id is less than the original id. Use it.
The spouse id is less than the original id. Discard this record, because the original is being used.
Here is an explicit way of expressing this:
select IdToUse
from (select t.*, tspouse.id tsid,
(case when tspouse.id is null then t.id
when t.id < tspouse.id then t.id
else NULL
end) as IdToUse
from t left outer join
t tspouse
on t.spouse_id = tspouse.id
) t
where IdToUse is not null;
You can simplify this to:
select t.*, tspouse.id tsid,
(case when tspouse.id is null then t.id
when t.id < tspouse.id then t.id
else NULL
end) as IdToUse
from t left outer join
t tspouse
on t.spouse_id = tspouse.id
where tspouse.id is null or
t.id < tspouse.id
Two tables is just plain bad design
Combine the tables
select id
from table
where id < spouseID
or spouseID is null

SQL - Removing Duplicate without 'hard' coding?

Heres my scenario.
I have a table with 3 rows I want to return within a stored procedure, rows are email, name and id. id must = 3 or 4 and email must only be per user as some have multiple entries.
I have a Select statement as follows
SELECT
DISTINCT email,
name,
id
from table
where
id = 3
or id = 4
Ok fairly simple but there are some users whose have entries that are both 3 and 4 so they appear twice, if they appear twice I want only those with ids of 4 remaining. I'll give another example below as its hard to explain.
Table -
Email Name Id
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3
jimmy#domain.com jimmy 3
So in the above scenario I would want to ignore the jimmy with the id of 3, any way of doing this without hard coding?
Thanks
SELECT
email,
name,
max(id)
from table
where
id in( 3, 4 )
group by email, name
Is this what you want to achieve?
SELECT Email, Name, MAX(Id) FROM Table WHERE Id IN (3, 4) GROUP BY Email;
Sometimes using Having Count(*) > 1 may be useful to find duplicated records.
select * from table group by Email having count(*) > 1
or
select * from table group by Email having count(*) > 1 and id > 3.
The solution provided before with the select MAX(ID) from table sounds good for this case.
This maybe an alternative solution.
What RDMS are you using? This will return only one "Jimmy", using RANK():
SELECT A.email, A.name,A.id
FROM SO_Table A
INNER JOIN(
SELECT
email, name,id,RANK() OVER (Partition BY name ORDER BY ID DESC) AS COUNTER
FROM SO_Table B
) X ON X.ID = A.ID AND X.NAME = A.NAME
WHERE X.COUNTER = 1
Returns:
email name id
------------------------------
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3