SQL sort based on comma-separated string? - sql

Say that I have a table that looks like this (sorry about using a picture, but I can't figure out how to get a nicely-formatted table on SO...):
I wanted to make a query to sort the ingredients by the frequency of recipes they appear in. So in this example, we'd want to see the following output:
I was thinking that LIKE and IN might be potentially helpful to make this search, but I'm not sure how to go from there.

I really can't resist a "you can't" quote. Your solution follows.
SELECT Item
FROM
(
SELECT LTRIM(x.XmlCol.value('.','varchar(100)')) 'Item'
FROM
(
SELECT CAST('<A>'+REPLACE(ingredients,',','</A><A>')+'</A>' AS XML) 'Ingredient'
FROM #recipes ) Mytab
CROSS APPLY Mytab.Ingredient.nodes('/A') x(xmlcol)
) Listing
GROUP BY ITEM
ORDER BY Count(1) DESC
Try it for yourself. In short you start by replacing the commas with XML seperators, Then for efficiency invoke Microsofts own XML methods to convert the list of values into a tabular output. You then simply drop the lot into a FROM statement and group with an order by.

Related

In SQL how can I remove duplicates on a column without including it in the SELECT command/ the extract

I'm trying to remove duplicates on a column in SQL, without including that column in the extract (since it contains personally identifiable data). I thought I might be able to do this with nested queries (as below), however this isn't working. I also thought it might be possible to remove duplicates in the WHERE statement, but couldn't find anything from googling. Any ideas? Thanks in advance.
SELECT [ETHNIC], [RELIGION]
FROM
(SELECT DISTINCT [ID], [ETHNIC], [RELIGION]
FROM MainData)
Using distinct like that will apply distinct to the row, so if there are two rows with the same ID but different ETHNIC and RELIGION the distinct won't remove them. To do that you could use group by in your query, but then you need to use an aggregation (e.g. max):
SELECT [ETHNIC], [RELIGION]
FROM
(SELECT [ID], MAX([ETHNIC]) AS ETHNIC, MAX([RELIGION]) AS RELIGION
FROM MainData
GROUP BY [ID])
If that's not what you're looking for, some SQL dialects require that you name your inner select, so you could try adding AS X to the end of your query.

SQL Syntax - Why do we need to list individual fields in an SQL group-by statement?

My understanding of using summary functions in SQL is that each field in the select statement that doesn't use a summary function, should be listed in the group by statement.
select a, b, c, sum(n) as sum_of_n
from table
group by a, b, c
My question is, why do we need to list the fields? Shouldn't the SQL syntax parser be implemented in a way that we can just tell it to group and it can figure out the groups based on whichever fields are in the select and aren't using summary functions?:
select a, b, c, sum(n) as sum_of_n
from table
group
I feel like I'm unnecessarily repeating myself when I write SQL code. What circumstances exist where we would not want it to automatically figure this out, or where it couldn't automatically figure this out?
To decrease the chances of errors in your statement. Explicitly spelling out the GROUP BY columns helps to ensure that the user wrote would they intended to write. You might be surprised at the number of posts that show up on Stackoverflow in which the user is grouping on columns that make no sense, but they have no idea why they aren't getting the data that they expect.
Also, consider the scenario where a user might want to group on more columns than are actually in the SELECT statement. For example, if I wanted the average of the most money that my customers have spent then I might write something like this:
SELECT
AVG(max_amt)
FROM (SELECT MAX(amt) FROM Invoices GROUP BY customer_id) SQ
In this case I can't simply use GROUP, I need to spell out the column(s) on which I'm grouping. The SQL engine could allow the user to explicitly list columns, but use a default if they are not listed, but then the chances of bugs drastically increases.
One way to think of it is like strongly typed programming languages. Making the programmer explicitly spell things out decreases the chance of bugs popping up because the engine made an assumption that the programmer didn't expect.
This is required to determine explicitly how do you want to group the records because, for example, you may use columns for grouping that are not listed in result set.
However, there are RDBMS which allow to not specify GROUP BY clause using aggregate functions like MySQL.
My first reaction would be that 'it is what it is' =)
But on thinking it through, the reason TSQL works like this is because the SELECT and the GROUP BY are two distinct parts of all the operations going on in the query.
This might not be the best example, but it does show that you can GROUP on different (well, 'more') fields than you are actually SELECTing.
SELECT brand = Convert(varchar(100), ''), model = Convert(varchar(100), ''), some_number = Convert(int, 0)
INTO #test
WHERE 1 = 2
INSERT #test (brand, model, some_number)
VALUES ('Ford', 'Focus', 10),
('Ford', 'Focus', 25),
('Ford', 'Kagu', 23),
('DMC', '12', 88)
SELECT brand, model, MAX(some_number)
FROM #test
GROUP BY brand, model
SELECT brand, MAX(some_number)
FROM #test
GROUP BY brand, model
Not all RDBMS's are like this, e.g. MySQL allows for omitting fields from the GROUP BY that are nevertheless in the SELECT part. From what I've seen, it then picks a random value ('there is no such a thing as an implicit first') and uses that in the SELECT .. I think, my knowledge on MySQL is rather limited but I've seen some examples here and there and they always confused me as I'm used to the strict requirement of TSQL you just described.
In addition, you can group by your columns in a different order than select
select a, b, c, sum(d)
from table
group by c,a,b
Also a lot of DBs allow you to skip column names, you can just specify which columns are going to be included in the group by using select position
select a, b, c, sum(d)
from table
group by 3,1,2

Summarizing a table result in SQL

Given the below table as a SQL Result:
I want to use the above generated table and produce a table which clubs the given information into:
I have multiple areaName and multiple functionNames and multiple users. Please let me know if this is possible and how?
I have tried couple of things but I am just drained out now and need a direction. Any help is appreciated.
Even if you can provide a pseudo code, I can try and make use of it. Start from the SQL result as a given table.
Use correlated sub-queries to achieve the desired result. I've provided an example below. Test it for the first summary column, and then add in your other summary columns if it does. Hopefully this makes sense, and helps. Alternatively you could use a CTE (common table expression) to achieve similar results.
SELECT a.areaName, a.functionName
, (SELECT count(DISTINCT b.UserKey)
from AREAS b
where a.areaName = b.areaName
and a.functionName = b.functionName
and b.[1-add] = 1) as UsersinAdd
-- Lather/rinse/repeat for other summary columns
FROM AREAS a
group by a.areaName, a.functionName
Your problem stems from the de-normalised structure of your table. Columns [1-add],...,[8-correction] should be values in a column, not columns. This leads to more complex queries, as you have discovered.
The unpivot command allows you to correct this mistake.
select areaname, functionname, rights, count(distinct userkey)
from
(
select * from yourtable
unpivot (permission for rights in ([1-add], [2-update/display],[4-update/display all] , [8-correction] )) u
) v
group by areaname, functionname, rights

SQL pattern to get "and" list of multiple-row matches?

I'm not a database programmer, but I have a simple database-backed app where I have items with tags. Each item may have multiple tags, so I'm using a typical junction table (like this), where each row represents the fact that the item with the appropriate ID has the tag with the appropriate ID.
This works very logically when I want to do something like select all items with a given tag.
But, what is the typical pattern for doing AND searches? That is, what if I want to find all items which have all of a certain set of tags? This is such a common operation that I'd think some of the intro tutorials would cover it, but I guess I'm not looking in the right places.
The approach I tried was to use INTERSECT, first directly and then with subqueries and IN. This works, but builds up long-seeming queries quickly as I add search terms. And, crucially, this approach appears to be about an order of magnitude slower than the approach of shoving all the tags as text into one "tags" column and using SQLite's full-text search. (And, as I would expect/hope, the FTS search gets faster as I add more terms, which doesn't seem to be the case with the INTERSECTS approach.)
What's the proper design pattern here, and what's the right way to make it snappy? I'm using SQLite in this case, but I'm most interested in a general answer, since this must be a common thing to do.
The following is the standard ANSI SQL solution which avoids synchronizing the number of ids and the ids themselves.
with tag_ids (tid) as (
values (1), (2)
)
select id
from tags
where id (select tid from tag_ids)
having count(*) = (select count(*) from tag_ids);
The values clause ("row constructor") is supported by PostgreSQL and DB2. For database that don't support that, you can replace it with a simple "select", e.g. in Oracle this would be:
with tag_ids (tid) as (
select 1 as tid from dual
union all
select 2 from dual
)
select id
from tags
where id (select tid from tag_ids)
having count(*) = (select count(*) from tag_ids);
For SQL Server you would simply leave out the "from dual", as it does not require a FROM clause for a SELECT.
This assumes that one tag can only be assigned exactly once. If that isn't the case, you would need to use a count(distinct id) in the having clause.
I would be inclined to use a group by:
select id
from tags
where id in (<tag1>, <tag2>)
group by id
having count(*) = 2
This would guarantee that both appear.
For an unlimited size list, you could store the ids in a string, such as '|tag1|tag2|tag3|' (note delimiters on ends). Then you can do:
select id
from tags
where #taglist like '%|'+tag+'|%'
group by id
having count(*) = len(#taglist) - (len(replace(#taglist, '|', '') - 1)
This is using SQL Server syntax. But, it is saying two things. The WHERE clause is saying that the tag is in the list. The HAVING clause is saying that the number of matches equals the length of the list. It does this with a trick, by counting the number of separtors and subtracting 1.

MS Access SQL: select rows with the same order as IN clause

I know that this question has been asked several times and I've read all the answer but none of them seem to completely solve my problem.
I'm switching from a mySQL database to a MS Access database. In both of the case I use a php script to connect to the database and perform SQL queries.
I need to find a suitable replacement for a query I used to perform on mySQL.
I want to:
perform a first query and order records alphabetically based on one of the columns
construct a list of IDs which reflects the previous alphabetical order
perform a second query with the IN clause applied with the IDs' list and ordered by this list.
In mySQL I used to perform the last query this way:
SELECT name FROM users WHERE id IN ($name_ids) ORDER BY FIND_IN_SET(id,'$name_ids')
Since FIND_IN_SET is available only in mySQL and CHARINDEX and PATINDEX are not available from my php script, how can I achieve this?
I know that I could write something like:
SELECT name
FROM users
WHERE id IN ($name_ids)
ORDER BY CASE id
WHEN ... THEN 1
WHEN ... THEN 2
WHEN ... THEN 3
WHEN ... THEN 4
END
but you have to consider that:
IDs' list has variable length and elements because it depends on the first query
that list can easily contains thousands of elements
Have you got any hint on this?
Is there a way to programmatically construct the ORDER BY CASE ... WHEN ... statement?
Is there a better approach since my list of IDs can be big?
UPDATE: I perform two separated query because I need to access two different tables.
The databse it's not very simple so I try to make an example:
Suppose I have a table which contains a list of users and a table which contains all the books that every user have in their bookshelf.
Since the dabase was designed in mySQL, for every book record I store the user_id in the books table in order to have a relationship between the user and the book.
Suppose now that I want to obtain a list of all the user that have books with a title starting with letter 'a' and I want to order the user based on the alphabetical oder of the books.
This is what I do:
perform a first query to find all the books which start with letter 'a' and sort the alphabetically
create a list of user_id which should reflect the alphabetical order of the book
perform a query in the users table to find out the users names and sort them with the user_id list to have the required sorting by book
Hope this clarify what I need.
If I understand correctly, you're trying to get a set of information in the same order that you specify the ID values. There is a hack that can convert a list into a table using XML and CROSS APPLY. This can be combined with the ROW_NUMBER function to generate your sort order. See the code below:
CREATE FUNCTION [dbo].[GetNvarcharsFromXmlArray]
(
#Strings xml = N'<ArrayOfStrings/>'
)
RETURNS TABLE
AS
RETURN
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS RowNumber, Strings.String.value('.', 'nvarchar(MAX)') AS String
FROM #Strings.nodes('/ArrayOfStrings/string/text()') AS Strings(String)
)
Which functions with the following structure:
<ArrayOfStrings>
<string>myvalue1</string>
<string>myvalue2</string>
</ArrayOfStrings>
It's also the same format .NET xml serializes string arrays.
If you want to pass a comma separated list, you can simply use:
CREATE FUNCTION [dbo].[GetNvarcharsCSV]
(
#CommaSeparatedStrings nvarchar(MAX) = N''
)
RETURNS TABLE
AS
RETURN
(
DECLARE #Strings xml
SET #Strings = CONVERT(xml, N'<ArrayOfStrings><string>' + REPLACE(#CommaSeperatedStrings, ',', N'</string><string>') + N'</string></ArrayOfStrings>')
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS RowNumber, Strings.String.value('.', 'nvarchar(MAX)') AS String
FROM #Strings.nodes('/ArrayOfStrings/string/text()') AS Strings(String)
)
This makes your query:
SELECT name
FROM users
INNER JOIN dbo.GetNvarcharsCSV(#name_ids) AS IDList ON users.ID = IDList.String
ORDER BY RowNumber
Note that it's a pretty simple rewrite to make the function return a table of integers if that's what you need.
You can see xml Data Type Methods to get a better understanding of what you can do with XML in SQL queries. Also, see ROW_NUMBER (Transact-SQL).
It sounds like you need a JOIN...
This should work, although it may need to be translated to Access syntax (which is apparently subtly different):
SELECT b.name, a.title
FROM book as a
JOIN user as b
ON b.id = a.userId
WHERE SUBSTRING(LOWER(a.title), 1, 1) = 'a'
ORDER by a.title
I don't know why you're switching to Access, although I have heard it's been improving in recent years. I think I'd prefer almost any other RDBMS, though. And your schema could probably stand some tweaking, from the sound of things.
You would have to use a user-defined function that maintains the order, and then order by that column. For example:
CREATE FUNCTION dbo.SplitList
(
#List VARCHAR(8000)
)
RETURNS TABLE
AS
RETURN
(
SELECT DISTINCT
[Rank],
[Value] = CONVERT(INT, LTRIM(RTRIM(SUBSTRING(#List, [Rank],
CHARINDEX(',', #List + ',', [Rank]) - [Rank]))))
FROM
(
SELECT TOP (8000) [Rank] = ROW_NUMBER()
OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
) AS n
WHERE [Rank] <= LEN(#List)
AND SUBSTRING(',' + #List, [Rank], 1) = ','
);
GO
Now your query can look something like this:
SELECT u.name
FROM dbo.users AS u
INNER JOIN dbo.SplitList($name_ids) AS s
ON u.id = s.Value
ORDER BY s.[Rank];
You may have to surround $name_ids with single quotes (dbo.SplitList('$name_ids')) depending on how the SQL statement is constructed. You may want to consider using a stored procedure instead of building this query in PHP.
You might also consider skipping MS-Access as a hopping point altogether. Why not just have PHP communicate directly with SQL Server?