SQL Query using IN but where all results have to have the same grouping ID - sql

Unsure exactly how to put this question so apologies for the awkward phrasing, generally when I know how to properly describe a problem I can use the site search to look for it, or Google etc.
Anyway, simple enough issue - I'm querying a link / junction table with a variable array of AttributeIds, for example the IDs 14 and 17. Using the following table:
http://img220.imageshack.us/img220/5999/setattributecombo.gif
The only valid result I want to return from this query is where the ProductSetId is the same, so instead of 'IN' I want something like
WHERE AttributeIds IN 14,17 AND ProductSetId is the same
In the above example, the only valid result would be 5 but if I use an IN query I get 2,5,7 as results.
Any ideas?

SELECT ProductSetID, COUNT(*) AS CountOfMatchingRows
From MyTable
WHERE AttributeId IN (14,17 )
GROUP BY ProductSetID
HAVING COUNT(*) = 2

It sounds like you want to perform Relational Division. A good article on various methods is over at Simple-Talk. http://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/

Related

What does * mean in sql?

For example, I know what SELECT * FROM example_table; means. However, I feel uncomfortable not knowing what each part of the code means.
The second part of a SQL query is the name of the column you want to retrieve for each record you are getting.
You can obviously retrieve multiple columns for each record, and (only if you want to retrieve all the columns) you can replace the list of them with *, which means "all columns".
So, in a SELECT statement, writing * is the same of listing all the columns the entity has.
Here you can find probably the best tutorial for SQL learning.
I am providing you answer by seperating each part of code.
SELECT == It orders the computer to include or select each content from the database name(table ) .
(*) == means all {till here code means include all from the database.}
FROM == It refers from where we have to select the data.
example_table == This is the name of the database from where we have to select data.
the overall meaning is :
include all data from the databse whose name is example_table.
thanks.
For a beginner knowing the follower concepts can be really useful,
SELECT refers to attributes that you want to have displayed in your final query result. There are different 'SELECT' statements such as 'SELECT DISTINCT' which returns only unique values (if there were duplicate values in the original query result)
FROM basically means from which table you want the data. There can be one or many tables listed under the 'FROM' statement.
WHERE means the condition you want to satisfy. You can also do things like ordering the list by using 'order by DESC' (no point using order by ASC as SQL orders values in ascending order after you use the order by clause).
Refer to W3schools for a better understanding.

SQLite: How to SELECT "most recent record for each user" from single table with composite key?

I'm not a database guru and feel like I'm missing some core SQL knowledge to grok a solution to this problem. Here's the situation as briefly as I can explain it.
Context:
I have a SQLite database table that contains timestamped user event records. The records can be uniquely identified by the combination of timestamp and user ID (i.e., when the event took place and who the event is about). I understand this situation is called a "composite primary key." The table looks something like this (with a bunch of other columns removed, of course):
sqlite> select Last_Updated,User_ID from records limit 4;
Last_Updated User_ID
------------- --------
1434003858430 1
1433882146115 3
1433882837088 3
1433964103500 2
Question: How do I SELECT a result set containing only the most recent record for each user?
Given the above example, what I'd like to get back is a table that looks like this:
Last_Updated User_ID
------------- --------
1434003858430 1
1433882837088 3
1433964103500 2
(Note that the result set only includes user 3's most recent record.)
In reality, I have approximately 2.5 million rows in this table.
Bonus: I've been reading answers about JOINs, de-dupe procedures, and a bunch more, and I've been googling for tutorials/articles in the hopes that I would find what I'm missing. I have extensive programming background so I could de-dupe this dataset in procedural code like I've done a hundred times before, but I'm tired of writing scripts to do what I believe should be possible in SQL. That's what it's for, right?
So, what do you think is missing from my understand of SQL, conceptually, that I need in order to understand why the solution you've provided to my question actually works? (A reference to a good article that actually explains the theory behind the practice would suffice.) I want to know WHY the solution actually works, not just that it does.
Many thanks for your time!
You could try this:
select user_id, max(last_updated) as latest
from records
group by user_id
This should give you the latest record per user. I assume you have an index on user_id and last_updated combined.
In the above query, generally speaking - we are asking the database to group user_id records. If there are more than 1 records for user_id 1, they will all be grouped together. From that recordset, maximum last_updated will be picked for output. Then the next group is sought and the same operation is applied there.
If you have a composite index, sqlite will likely just use the index because the index contains both fields addressed in the query. Indexes are smaller than the table itself, so scanning or seeking is faster.
Well, in true "d'oh!" fashion, right after I ask this question, I find the answer.
For my case, the answer is:
SELECT MAX(Last_Updated),User_ID FROM records GROUP BY User_ID
I was making this more complicated than it needed to be by thinking I needed to use JOINs and stuff. Applying an aggregate function like MAX() is all that's needed to select only those rows whose content matches the function result. That means this statement…
SELECT MAX(Last_Updated),User_ID FROM records
…would therefor return a result set containing only 1 row, the most recent event.
By adding the GROUP BY clause, however, the result set contains a row for each "group" of results, i.e., for each user. My programmer-brain did not understand that GROUP BY is how we say "for each" in SQL. I think I get it now.
Note to self: keep it simple, stupid. :)

Microsoft Access SQL STDEV of COUNT of data

I have a table in MS Access 2010 I'm trying to analyze of people who belong to various groups having completed various jobs. What I would like to do is calculate the standard deviation of the count of the number of jobs each person has completed per group. Meaning, the output I would like is that for each group, I'd have a number that constitutes the standard deviation of how many jobs each person did.
The data is structured like this:
OldGroup, OldPerson, JobID
I know that I need to do a COUNT of the job IDs by Group and Person. I tried creating a subquery to work with, but that didn't work:
SELECT data.OldGroup, STDEV(
SELECT COUNT(data.JobID)
FROM data
WHERE data.Classification = 1
GROUP BY data.OldGroup, data.OldPerson
)
FROM data
GROUP BY data.OldGroup;
This returned an error "At most one record can be returned by this subquery," which I know is wrong, since when I tried to run the subquery as a standalone query it successfully returned more than one record.
Question:
How can I get the STDEV of a COUNT?
Subquestion: If this question can be answered by correcting incorrect syntax in my examples, please do so.
A minor change in strategy that wouldn't work for all cases but did end up working for this one seemed to take care of the problem. Instead of sticking the subquery in the SELECT statement, I put it in FROM, mimicking creating a separate table.
As such, my code looks like:
SELECT OldGroup, STDEV(NumberJobs) AS JobsStDev
FROM (
SELECT OldGroup, OldPerson, COUNT(JobID) AS NumberJobs
FROM data
WHERE data.Classification = 1
GROUP BY OldGroup, OldPerson
) AS TempTable
GROUP BY OldGroup;
That seemed to get the job done.
Try doing a max table query for "SELECT COUNT(data.JobID)...."
Then for the 2nd query, use the new base table.
Sometimes it is just easier to do something in 2 or more queries.

Is there a UniData SQL equivalent to the UniQuery SAMPLE keyword?

I'm using UniData 6. Is there a UniData SQL equivalent to the UniQuery SAMPLE keyword?
Using UniQuery, I've always been able to do:
SELECT CUST BY NAME SAMPLE 1
and it would give me the record with the first alphabetical name.
In UniData SQL, I'd like to be able to do something like:
SELECT NAME FROM CUST ORDER BY NAME SAMPLE 1;
...or, as in other SQL databases...
SELECT TOP 1 NAME FROM CUST ORDER BY NAME;
and get just the name of the the customer who's listed first alphabetically. Is there a keyword like this?
Unfortunately, no, there does not appear to be a UniSQL equivalent to the UniQuery SAMPLE keyword. UniSQL consists of a subset of ANSI SQL-92 standards, with some extensions to support multivalue. However, ANSI SQL-92 does not contain a standard for limiting the result set returned from a query, which is why various DBMS have different syntax for doing so.
ANSI SQL-2008 added the FETCH FIRST clause which is the standard way of implementing a limit to the number of rows returned by a query. It would require a pretty significant update to bring UniSQL up to recent standards since it is now 20+ years behind. There doesn't seem to be significant enough demand in the user community to undertake that effort.
Depending on your file's schema, you may be able to apply a workaround. If you are using an auto-incrementing key, you could use a syntax such as:
SELECT foo
FROM bar
WHERE #ID <= 10
The above query would be apply a de facto limit to the number of rows returned.
SELECT will usually only apply to record IDs. If you want to list out attributes, try LIST: LIST INVENTORY PROD_NAME PRICE QTY SAMPLE for instance will return the first 10 product names, prices and quantities.

Use a LIKE clause in part of an INNER JOIN

Can/Should I use a LIKE criteria as part of an INNER JOIN when building a stored procedure/query? I'm not sure I'm asking the right thing, so let me explain.
I'm creating a procedure that is going to take a list of keywords to be searched for in a column that contains text. If I was sitting at the console, I'd execute it as such:
SELECT Id, Name, Description
FROM dbo.Card
WHERE Description LIKE '%warrior%'
OR
Description LIKE '%fiend%'
OR
Description LIKE '%damage%'
But a trick I picked up a little while go to do "strongly typed" list parsing in a stored procedure is to parse the list into a table variable/temporary table, converting it to the proper type and then doing an INNER JOIN against that table in my final result set. This works great when sending say a list of integer IDs to the procedure. I wind up having a final query that looks like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblExclusiveCard ON dbo.Card.Id = #tblExclusiveCard.CardId
I want to use this trick with a list of strings. But since I'm looking for a particular keyword, I am going to use the LIKE clause. So ideally I'm thinking I'd have my final query look like this:
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' + #tblKeyword.Value + '%'
Is this possible/recommended?
Is there a better way to do something like this?
The reason I'm putting wildcards on both ends of the clause is because there are "archfiend", "beast-warrior", "direct-damage" and "battle-damage" terms that are used in the card texts.
I'm getting the impression that depending on the performance, I can either use the query I specified or use a full-text keyword search to accomplish the same task?
Other than having the server do a text index on the fields I want to text search, is there anything else I need to do?
Try this
select * from Table_1 a
left join Table_2 b on b.type LIKE '%' + a.type + '%'
This practice is not ideal. Use with caution.
Your first query will work but will require a full table scan because any index on that column will be ignored. You will also have to do some dynamic SQL to generate all your LIKE clauses.
Try a full text search if your using SQL Server or check out one of the Lucene implementations. Joel talked about his success with it recently.
try it...
select * from table11 a inner join table2 b on b.id like (select '%'+a.id+'%') where a.city='abc'.
Its works for me.:-)
It seems like you are looking for full-text search. Because you want to query a set of keywords against the card description and find any hits? Correct?
Personally, I have done it before, and it has worked out well for me. The only issues i could see is possibly issues with an unindexed column, but i think you would have the same issue with a where clause.
My advice to you is just look at the execution plans between the two. I'm sure that it will differ which one is better depending on the situation, just like all good programming problems.
#Dillie-O
How big is this table?
What is the data type of Description field?
If either are small a full text search will be overkill.
#Dillie-O
Maybe not the answer you where looking for but I would advocate a schema change...
proposed schema:
create table name(
nameID identity / int
,name varchar(50))
create table description(
descID identity / int
,desc varchar(50)) --something reasonable and to make the most of it alwase lower case your values
create table nameDescJunc(
nameID int
,descID int)
This will let you use index's without have to implement a bolt on solution, and keeps your data atomic.
related: Recommended SQL database design for tags or tagging
a trick I picked up a little while go
to do "strongly typed" list parsing in
a stored procedure is to parse the
list into a table variable/temporary
table
I think what you might be alluding to here is to put the keywords to include into a table then use relational division to find matches (could also use another table for words to exclude). For a worked example in SQL see Keyword Searches by Joe Celko.
Performance will be depend on the actual server than you use, and on the schema of the data, and the amount of data. With current versions of MS SQL Server, that query should run just fine (MS SQL Server 7.0 had issues with that syntax, but it was addressed in SP2).
Have you run that code through a profiler? If the performance is fast enough and the data has the appropriate indexes in place, you should be all set.
LIKE '%fiend%' will never use an seek, LIKE 'fiend%' will. Simply a wildcard search is not sargable
Try this;
SELECT Id, Name, Description
FROM dbo.Card
INNER JOIN #tblKeyword ON dbo.Card.Description LIKE '%' +
CONCAT(CONCAT('%',#tblKeyword.Value),'%') + '%'