Getting the first of a GROUP BY clause in SQL - sql

I'm trying to implement single-column regionalization for a Rails application and I'm running into some major headaches with a complex SQL need. For this system, a region can be represented by a country code (e.g. us) a continent code that is uppercase (e.g. NA) or it can be NULL indicating the "default" information. I need to group these items by some relevant information such as a foreign key (we'll call it external_id).
Given a country and its continent, I need to be able to select only the most specific region available. So if records exist with the country code, I select them. If, not I want a records with the continent code. If not that, I want records with a NULL code so I can receive the default values.
So far I've figured that I may be able to use a generated CASE statement to get an arbitrary sort order. Something like this:
SELECT *, CASE region
WHEN 'us' THEN 1
WHEN 'NA' THEN 2
ELSE 3
END AS region_sort
FROM my_table
WHERE region IN ('us','NA') OR region IS NULL
GROUP BY external_id
ORDER BY region_sort
The problem is that without an aggregate function the actual data returned by the GROUP BY for a given row seems to be untameable. How can I massage this query to make it return only the first record of the region_sort ordered groups?

Only the first record or only the first group of records? It's not even clear from what you've written whether there is more than one record or not.
In any case, it seems you are bending over backwards to do this in one query, but the database structure is not optimized for this. If there's just 3 levels and you want the most specific, why not just:
SELECT * FROM my_table WHERE region = 'us' GROUP BY external_id
If that returns something then you, stop, otherwise you run 1 or 2 more queries conditionally.
I could be wrong, but my instinct says that will yield much better overall performance, though I suppose it depends on the particulars of your DB.

Related

Determine the count of rows returned

Okay so i have been trying to find out if its possible to return how many times a particular bit of data is returned
Event_id
---------
Change
Change
Change
Problem
Task
so i want to find out how many times this string data is returned and pop out a value say for change i would expect 3 and so on.
i was hoping this would be possible in a where statement but i have never used count so unsure on how it all works.
Sounds like a classic count function which requires group by clause; it should contain all non-aggregated columns (event_id in this case).
select event_id,
count(*)
from your_table
group by event_id

How to get data based on two columns from same table in SQL

I wanted to retrieve some data from a table based on two columns see the below table structure
Update
i want the output data based on two condition
1. if the code value is having 'Web' or 'Offline'.
2. Memo column is having data same as Pre_memo column.
Output should be as shown below
So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data.
select distinct OrderTable.Memo,
max(OrderTable.Memo_Date) as Date1,
max(ot.Pre_Memo_Date) as Date2
from OrderTable,
OrderTable ot
where OrderTable.code in ('Web')
and ot.code in ('Offline')
and OrderTable.Memo = ot.Pre_Memo
group by OrderTable.Memo
Can anyone help on this? With the use of OrderTable only once in the query and filter based on memo and pre_memo column as it's having same data?
You can use union all and do the conditional aggregation :
select Memo, max(case when code = 'Offline' then Date end) as Memo_date,
max(case when code = 'Web' then Date end) as Per_Memo_date
from (select Date, 'Web' as code, Pre_memo as Memo
from OrderTable o
where code = 'Web'
union all
select Date, 'Offline', Memo
from OrderTable o
where code = 'Offline'
) t
group by Memo;
"I wanted to retrieve some data from a table based on two columns see the below table structure"
Providing a sample is sufficient to illustrate the problem (and it is desirable to do so on SO) but it is not sufficient and thus not a replacement for defining the problem, which you have failed to do.
Absent such definition of the problem, we can only guess what you're trying to achieve. E.G.
from the subset of tuples that have 'Offline' for 'code' value, take the MAX() 'Date' value per appearing value of 'Memo'.
Match that (using some matching condition) to the subset of tuples that have 'Web' for 'code value and retain the 'Date' value from those as 'Memo_date' in the result set.
matching condition being that 'Memo' value of [a tuple in] the former is equal to 'Pre_memo' value in [the matching tuple in] the latter.
If all that is correct, then that explains why it is impossible to do this in SQL without having at least two references. You cannot avoid doing some kind of matching, and matching by definition takes two distinct things to match (even if the two distinct things are distinct subsets of one and the same thing). In fact it is almost certainly a fundamental design mistake for you to have those two distinct things in one single table, probably under the totally misguided belief that "having everything in one table makes things easier".
"So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data"
From the way you have presented the question, I suspect that you were hoping for some means to exploit the fact that those 'Offline' tuples are "the next" after a 'Web' tuple, and that you could write the SQL in such a way that the engine could then derive a sort of "single pass" algorithm (which you probably assume would go faster).
It does not work like that. SQL tables have no inherent ordering and as a consequence there simply ain't no such thing as "the next" in a table.

MS-Access 2007: Query for names that have two or more different values in another field

Hello & thank you in advance.
I have an access db that has the following information about mammals we captured. Each capture has a unique ID, which is the capture table's primary key: "capture_id". The mammals (depending on species) have ear tags that we use to track them from year to year and day to day. These are in a field called "id_code". I have the sex of the mammal as it was recorded at capture time in another field called sex.
I want a query that will return all instances of an id_code IF the sex changes even once for that id.
Example: Animal E555 was caught 4 times, 3 times someone recorded this animal as a F and once as a M.
I've managed to get it to display this info by stacking about 5 queries on top of each other (Query for recaptured animals -> Query for all records of animals from 1st query -> Query for unique combo of id & sex (via just using those two columns & requiring "Unique Values") -> Query that pulls only duplicate id values from that last one and pulls back up all capture records of those ids). HOwever, this is clearly not the right way to do this, it is then not updateable (which I need since this is for data quality control) and for some reason it also returns duplicates of each of those records...
I realize that this could be solved two other ways:
Using R to pull up these records (I want none of this data to have to leave the database though, because we're working on getting it into one place after 35 years of collecting! And my boss can't use R and I'm seasonal, so I want him to just have to open a query)
Creating a table that tracks all animal id's as an animal index. However, this would make entering the data more difficult and also require someone to go back through 20,000 records and create a brand new animal id for every one because you can't give ear tags to voles & things so they don't get a unique identifier in the field.
Help!
It is quite simple to do with a single query. As a bonus, the query will be updatable, not duplicated, and simple to use:
SELECT mammals.ID, mammals.Sex, mammals.id_code, mammals.date_recorded
FROM mammals
WHERE mammals.id_code In
(select id_code from
(select distinct id_code, sex from [mammals]) a
group by id_code
having count(*)>1
);
The reason why you see a sub-query inside a sub-query is because Access does not support COUNT(DISTINCT). With any other "normal" database you would write:
SELECT mammals.ID, mammals.Sex, mammals.id_code, mammals.date_recorded
FROM mammals
WHERE mammals.id_code In
(select id_code
from [mammals]
group by id_code
having count(DISTINCT Sex)>1
);

SQL or statement vs multiple select queries

I'm having a table with an id and a name.
I'm getting a list of id's and i need their names.
In my knowledge i have two options.
Create a forloop in my code which executes:
SELECT name from table where id=x
where x is always a number.
or I'm write a single query like this:
SELECT name from table where id=1 OR id=2 OR id=3
The list of id's and names is enormous so i think you wouldn't want that.
The problem of id's is the id is not always a number but a random generated id containting numbers and characters. So talking about ranges is not a solution.
I'm asking this in a performance point of view.
What's a nice solution for this problem?
SQLite has limits on the size of a query, so if there is no known upper limit on the number of IDs, you cannot use a single query.
When you are reading multiple rows (note: IN (1, 2, 3) is easier than many ORs), you don't know to which ID a name belongs unless you also SELECT that, or sort the results by the ID.
There should be no noticeable difference in performance; SQLite is an embedded database without client/server communication overhead, and the query does not need to be parsed again if you use a prepared statement.
A "nice" solution is using the INoperator:
SELECT name from table where id in (1,2,3)
Also, the IN operator is syntactic sugar built for exactly this purpose..
SELECT name from table where id IN (1,2,3,4,5,6.....)
Hoping that you are getting the list of ID's on which you have to perform a query for names as input temp table #InputIDTable,
SELECT name from table WHERE ID IN (SELECT id from #InputIDTable)

SQL query using fn_Split to find multiple values in column

This is a followup to my earlier question which got me part of the way to the goal.
Here is what I'm starting with:
SELECT * FROM MyTable WHERE County IN (SELECT value From dbo.fn_Split(#counties, ','))
Here's the scenario: In my table I've got a column named County. Each record can have multiple counties in the County column delimited with commas (I know this is bad form, I didn't do it). For example: county1, county22, county41. Some records may have only one county (say county13) others might have all the counties. So: county1, count2, county3... through county45 (yes, it's terrible, I know).
In the app I'm trying to build users can select multiple counties or even all counties in the same format as above (county1, county2, county3...). Thanks to Martin's help in the previous question I can get it to return records that have each of the counties individually but not the records that might contain all of the counties.
For example: The user selects county4, county26. I need to have the records returned that have just county4 and county26 as well as any that might contain both of them as part of a larger set (like all 45 of them).
Hope this is clear and I didn't make it more convoluted than necessary. Any assistance is very, very, very much appreciated!
To illustrate:
County
Record1 county1, county14, county26
Record2 county14
Record3 county1, county2, ... through county45
User Submission: county1, county26
Returns: Record1 and Record3
Not sure if I understood the question, but here is how I interpret it:
You need to return rows from your table for one or more selected items.
Also, you want to be able to select ALL items at once without passing the whole list.
I'd do that using Stored Procedure with 2 parameters:
#Selection SMALLINT
#TVP_County CountyTableType (It is Table Valued Variable. See: https://msdn.microsoft.com/en-us/library/bb510489.aspx)
If #Selection = 1 then you join #TVP_County with your table to get results.
If #Selection = 0 you return ALL records from your table w/o join and do not use #TVP_County at all.
If #Selection = -1 then you exclude #TVP_County items from your table. In that case you will be able to do a reverse marks. User will be able to select just few counties, which he/she does not want to see.
Of cause, within a stored procedure you have to implement the logic to run three different queries depending on the first parameter.