OK, I've googled and googled and still can't get this.
Effectively, in a table containing several hundred thosand rows, one column has a unique idendtifier (not a PK and not really unique, but hey) and another has numerical values.
The unique identifier (UI) is unique only within that table and is sort of incremental, in that the highest number signifies the most recent table entry.
Effectively, I need to break the rows down to relevant rows using a WHERE clause, then get the most recent UI of those rows together with the SUM of the values of those rows.
i.e. if UI are 1, 3, 5, 7, 10 and the corresponding values for the aggregate function are 100, 300, 500, 700 and 1000, what I need to have as query result is UI 10, Sum 2600.
DB is SQL2000
How do I acheive this?
It sounds like all of the items in the table need to be summed and returned with the max identifier. Would this work for you?
Select Max(ID), Sum(Number) from TableName
ID would be your Unique Identifier column name.
Number would be your column name that holds the numbers.
TableName is the name of your table.
Related
I am struggling to understand what the output of SELECT is meant to be in SQL (I am using MS ACCESS), and what sort of criteria this output needs to specify, if any. As a result, I don't understand why some queries work and others don't. So I know it retrieves data from a table, does calculations with it and displays it. But I don't understand the "inner" working of SELECT function. For instance, what is the name of data structure / entity it displays? Is it a "new" table?
And for example, suppose I have a table called "table_name", with 5 columns. One of the columns called "column_3", and there are 20 records.
SELECT column_3, COUNT(*) AS Count
FROM table_name;
Why does this query fail to run? By logic, I would expect it to display two columns: first column will be "column_3", containing 20 rows with relevant data, and second column will be "Count", containing just one non-empty row (displaying 20), and other 19 rows will be empty (or NULL maybe)?
Is it because SELECT is meant to produce equal number of rows for each column?
Your questions involve a basic understanding of SQL. SELECT statements do not create tables, but instead return virtual result sets. Nothing is persisted unless you change it to an INSERT.
In your example question, you will need to "tell" the SQL engine what you want a count "of". Because you added column_3, you need to write:
SELECT column_3, COUNT(*) AS Count
FROM table_name
GROUP BY column_3
If you wanted a count of all the rows, simply:
SELECT COUNT(*) FROM table_name
I think I have a table that lacks a true primary key and I need to make one in the output. I cannot modify the table.
I need to run a select query to generate a list of values (list_A), then take those values and query them to show all the records related to them. From those records, I do another select to extract a now visible list called list_B. From list_B, I can search them to reveal all the records related to the original list (list_A), with many of those records missing the values from list_A but still need to be counted.
Here's my process so far:
I declared a sequence called 'temp_key', which starts from 1 and increments by 1.
I add a field called 'temp_key' to the parent query, so that it will hopefully show which element of the original list_A sub-query the resulting records are related to.
I run into trouble because I don't know how to make the temp_key increment as the list_A sub-query moves from the beginning to end of all the values in the list.
SELECT currval(temp_key) AS temp_key, list_A, list_B
FROM table
WHERE list_B IN (SELECT DISTINCT list_B
FROM table
WHERE list_A IN (SELECT DISTINCT list_A
from table);
As it is now, the above query doesn't work because there seems to be no way to make the current value of temp_key increment upward as it goes through values from the list originally generated from the lowest level sub-query (list_A).
For example, there might be only 10 values in list_A. And the output could have 100s of records, all labeled 1 through 10, with many of those values missing values in the list_A field. But they still need to be labeled 1 through 10 because the values of list_B connect the two sets.
Maybe you can create a new primary key column first with the following code (concatenating row number with list_a):
WITH T AS (
SELECT currval(temp_key) AS temp_key, list_A, list_B,
CONCAT(ROW_NUMBER() OVER(PARTITION BY list_A ORDER BY list_B),list_A) AS Prim_Key
FROM table )
SELECT * fROM T
Then you can specify in the where clause what keys you want to select
I am attempting to return the row of the highest value for timestamp (an integer) for each person (that has multiple entries) in a table. Additionally, I am only interested in rows with the field containing ABCD, but this should be done after filtering to return the latest (max timestamp) entry for each person.
SELECT table."person", max(table."timestamp")
FROM table
WHERE table."type" = 1
HAVING table."field" LIKE '%ABCD%'
GROUP BY table."person"
For some reason, I am not receiving the data I expect. The returned table is nearly twice the size of expectation. Is there some step here that I am not getting correct?
You can 1st return a table having max(timestamp) and then use it in sub query of another select statement, following is query
SELECT table."person", timestamp FROM
(SELECT table."person",max(table."timestamp") as timestamp, type, field FROM table GROUP BY table."person")
where type = 1 and field LIKE '%ABCD%'
Direct answer: as I understand your end goal, just move the HAVING clause to the WHERE section:
SELECT
table."person", MAX(table."timestamp")
FROM table
WHERE
table."type" = 1
AND table."field" LIKE '%ABCD%'
GROUP BY table."person";
This should return no more than 1 row per table."person", with their associated maximum timestamp.
As an aside, I surprised your query worked at all. Your HAVING clause referenced a column not in your query. From the documentation (and my experience):
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.
I'm extracting data from several databases and want to keep track of duplicate records without purging them. My solution is to create a new sequence field, where rows are marked duplicate by having the same sequence number. Keep in mind that not all columns have to be equal to be considered a duplicate.
How do I do this? My goal is to have this table with all duplicate records intact, and finally another table where I would only have unique records by merging those with same sequence ID.
Try this:
select t.*, Sequence_ID=DENSE_RANK() over (
order by <fields_you_want_to_test_for_uniqueness>
)
from <your_table> t
Note that DENSE_RANK() gives you identical values for a "tie", but also gives you consecutive numbers (e.g. 1, 2, 3, 3, 4), whereas RANK() gives you the same value for a "tie", but then skips numbers (e.g. 1, 2, 3, 3, 5). Choose whichever one suits your needs.
I have a MS SQL DB with about 2,600 records (each one information on a computer.) I need to write a SELECT statement that selects about 400 of those records.
What's the best way to do that when they don't have any common criteria? They're all just different random numbers so I can't use wildcards or anything like that. Will I just have to manually include all 400 numbers in the query?
If you need 400 specific rows where their column match a certain number:
Yes include all 400 numbers using an IN clause. It's been my experience (via code profiling) that using an IN clause is faster than using where column = A or column = B or ...
400 is really not a lot.
SELECT * FROM table WHERE column in (12, 13, 93, 4, ... )
If you need 400 random rows:
SELECT TOP 400 * FROM table
ORDER BY NEWID()
Rather than executing multiple queries or selecting the entire rowset and filtering it yourself, create either a temporary table or or a permanent table where you an insert temporary rows for each ID. In your main query just join on your temporary table.
For example, if your source table is...
person:
person_id
name
And you have 400 different person_id's you want, let's say we have a permanent table for our temporary rows, like this...
person_query:
query_id
person_id
You'd insert your rows into person_query, then execute your query like this..
select
*
from person p
join person_query pq on pq.person_id = p.person_id
where pq.query_id = #query_id
Maybe you have found a deficiency in the database design. That is, there is something common amongst the 400 records you want and what you need is another column in the database to indicate this commonality. You could then select against this new column.
As Brian Bondy said above, using the IN statement is probably the best way
SELECT * FROM table WHERE column in (12, 13, 93, 4, ... )
One good trick is to paste the IDs in from a spreadsheet, if you have one ...
If the IDs of the rows you want are in a spreadsheet, then you can add an extra column to the spreadsheet that CONCATENATES() a comma on to the end of the ID, so that the column in your spreadsheet looks like this:
12,
13,
93,
4,
then copy and paste this column of data into your query, so it looks like this:
SELECT * FROM table WHERE column in (
12,
13,
93,
4,
...
)
It doesn't look pretty but its a quick way of getting all the numbers in.
You could create an XML list or something of the sort which would keep track of what you need to query, and then you could write a query that would iterate through that list bringing all of them back.
Here is a website that has numerous examples of performing what you are looking for in a number of different methods (#4 is the XML method).
You can create a table with those 400+ random tokens, and select on those. e.g.,
SELECT * FROM inventory WHERE inventory_id IN (SELECT id FROM inventory_ids WHERE tag = 'foo')
You still have to maintain the other table, but at least you're not having one ginormous query.
I would built a separate table with your selection criteria and then join the tables together or something like that, assuming your criteria is static of course.
Just select the TOP n rows, and order by something random.
Below is a hypothetical example to return 10 random employee names:
SELECT TOP 10
EMP.FIRST_NAME
,EMP.LAST_NAME
FROM
Schema.dbo.Employees EMP
ORDER BY
NEWID()
For this specific situation (not necessarily a general solution) the fastest and simplest thing is probably to read the entire SQL table into memory and find your matches in your program's code rather than have the database parse a gigantic where clause.