Remove all non-duplicate rows based on a single column in a SQL Server table where there conditions on multiple columns for the select statement - sql

I am attempting to pull data from a SQL Server table that meets certain criteria. Part of that criteria is that there can be multiple rows with the same data in my column and I need all of those rows returned...what I do not want are rows returned that are distinct.
I want to find a session that is in a specific date range, and meets one of two types of action, and are multiple, meaning there are two or more rows for the session.
Example SQL query:
SELECT activity and message
FROM myTable
WHERE (date BETWEEN '1/1/2020' and '1/31/2020')
AND activity IN ('trace', 'info')
Can you advise how I can grab the rows that meet my criteria of being in the correct date range and with the correct activity, but that have multiple rows only. I want no data that does not meet those three criteria.
Update to Body:
In creating the example query in my initial post, I neglected to include the label column. So the SELECT should read "SELECT activity, label and message FROM myTable WHERE (date BETWEEN '1/1/2020' and '1/31/2020') AND activity IN ('trace','info')". Based on sample data, I would expect the following return:
activity message label
-------- ------- -----
trace logged 1234
info written 1234
Label '1234' is the only value that meets all criteria: falls in the date range, meets activity values and has multiple rows.

With the limited information I could only make a wild guess and see if something like below works for you.
SELECT activity,
message,
COUNT(*) AS Count
FROM myTable
WHERE date BETWEEN '1/1/2020' and '1/31/2020'
AND activity IN ('trace', 'info')
AND message IN ('logged', 'written')
GROUP BY activity,
message
HAVING COUNT(*) > 1
ORDER BY Count DESC

Related

Counting the number of times same record exist in a given period of time

I am trying to write a query to find out whether a record exist more than one or not in a given period of time. And even if it exist, how many times the same record has been repeated.
Now to solve this issue, I have sorted the records.
select * from table_name where date = ? and date > ? order by email
And trying to count the number of times the same record exist.But I am not able to figure out a way to count the number of times the same record exists.
Here is a problem.The image below holds the basic data structure.
Here is the expected output for a year
The table above holds Xyz name and xyz#email.com data three times. And the name Abc and email abc#email.com two times and the third record name Def and email def#email.com two times. Now what I am trying to figure out a way to find out the number of times each records are being repeated in a given period of time using a single query. I am thinking to make use of recursion on a record and count till it didn't find a different record after sorting it. But using recursion on every records seems expensive.
Is there a better solution to solve this problem ?
Regards
Group and count.
SELECT column_to_compare1, column_to_compare2, COUNT(*)
FROM table_name
WHERE [date] BETWEEN #date1 AND #date2
GROUP BY column_to_compare1, column_to_compare2
HAVING COUNT(*) > 1 -- IF YOU WANT TO ONLY INCLUDE RECORDS WITH DUPLICATES
Between is inclusive, so you can adjust your dates with DATEADD if you really want between.
You can use the COUNT function to do this.
To do this using your own query:
SELECT Name, Email, COUNT(*) AS count
FROM table_name
WHERE date BETWEEN '01/01/2005' AND '31/12/2005'
GROUP BY Name, Email
However your example query is poor so I cannot give you a better solution. Here is an example of this working: SQL Fiddle
EDIT: Updated my solution to match you expected output.

Return All Historical Records for Accounts with Change in Specific Associated Value

I am trying to select all records in a time-variant Account table for each account with a change in an associated value (e.g. the maturity date). A change in the value will result in the most recent record for an account being end-dated and a new record (containing a new effective date of the following day) being created. The most recent records for accounts in this table have an end-date of 12/31/9000.
For instance, in the below illustration, account 44444444 would not be included in my query result set since it hasn't had a change in the value (and thus also has no additional records aside from the original); however, the other accounts have multiple changes in values (and multiple records), so I would want to see those returned.
Also, the table has a number of other fields (columns) not included below but for which changes in the values for these fields can trigger a new record being created; however, I only want to retrieve all records for those accounts where the figure in the “value” column has changed. What are some ways to obtain the results I need?
Note: The primary key for this table includes the acct_id and eff_dt, and I'm using PostgreSQL within a Greenplum environment.
Here are two types of queries I tried to use but which produced problematic results:
Query 1
Query 2
I think you want window functions to compare the value:
select t.*
from (select t.*,
min(t.value) over (partition by t.acct_id) as min_value,
max(t.value) over (partition by t.acct_id) as max_value
from t
) t
where min_value <> max_value;

Get latest data for all people in a table and then filter based on some criteria

I am attempting to return the row of the highest value for timestamp (an integer) for each person (that has multiple entries) in a table. Additionally, I am only interested in rows with the field containing ABCD, but this should be done after filtering to return the latest (max timestamp) entry for each person.
SELECT table."person", max(table."timestamp")
FROM table
WHERE table."type" = 1
HAVING table."field" LIKE '%ABCD%'
GROUP BY table."person"
For some reason, I am not receiving the data I expect. The returned table is nearly twice the size of expectation. Is there some step here that I am not getting correct?
You can 1st return a table having max(timestamp) and then use it in sub query of another select statement, following is query
SELECT table."person", timestamp FROM
(SELECT table."person",max(table."timestamp") as timestamp, type, field FROM table GROUP BY table."person")
where type = 1 and field LIKE '%ABCD%'
Direct answer: as I understand your end goal, just move the HAVING clause to the WHERE section:
SELECT
table."person", MAX(table."timestamp")
FROM table
WHERE
table."type" = 1
AND table."field" LIKE '%ABCD%'
GROUP BY table."person";
This should return no more than 1 row per table."person", with their associated maximum timestamp.
As an aside, I surprised your query worked at all. Your HAVING clause referenced a column not in your query. From the documentation (and my experience):
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.

Create Select distinct query with criteria of having the latest date

I have been struggling with creating a query in Access to select a distinct field with the criteria of having the newest entry in the database.
Heres a brief summary of how what my table conssists of. I have a table with surveying data collected from 2007 to the present. We have field with a survey marks name with corresponding adjustment data. In the corresponding data there is field with the adjusmtent date. Many of the marks have been occupied mutiple times and only want to retrieve the most recent occupation information.
Roughly i want to
SELECT DISTINCT STATUS_POINT_DESIGNATION
FROM __ALL_ADJUSTMENTS
WHERE [__ALL_ADJUSMENTS]![ADJ_DATE]=MAX(ADJ_DATE)
I seem to be getting confused how relate the select a distinct value with a constraint. Any Suggestions?
DH
Seems you could achieve your aim of getting the latest observation for each survey point by a summary function:
SELECT STATUS_POINT_DESIGNATION, Max(ADJ_DATE) AS LatestDate, Count(STATUS_POINT_DESIGNATION) AS Observations
FROM __ALL_ADJUSTMENTS
GROUP BY STATUS_POINT_DESIGNATION;

In SQL, why does group by make a difference when using having count()

I have a table that stores zone_id. Sometimes a zone id is twice in the database. I wrote a query to show only entries that have two or more entries of the same zone_id in the table.
The following query returns the correct result:
select *, count(zone_id)
from proxies.storage_used
group by zone_id desc
having count(zone_id) > 1;
However, if I group by last_updated or company_id, it returns random values. If I don't add a group by clause, it only displays one value as per the screenshot below. First output shows above query string, second output shows same query string without the 'group by' line and returns only one value:
correction: I'm a new member and thus can't post pictures directly, so I added it on minus: http://min.us/m3yrlkSMu#1o
While my query works, I don't understand why. Can somebody help me understand why group by is altering the actual output, instead of only the grouping of the output? I am using MySQL.
A group by divides the resulting rows into groups and performs the aggregate function on the records in each group. If you do a count(*) without a group by you will get a single count of all rows in a table. Since you didn't specify a group by there is only one group, all records in the table. If you do a count(*) with a group by of zone id, you will get a count of how many records there are for each zone id. If you do a count(*) of zone id and last updated date, you will get a count of how many rows were updated on each date in each zone.
Without a group by clause, everything is stored in the same group, so you get a single result. If there are more than one row in your table, then the having will succeed. So, you'll end up counting all the rows in your table...
source
From what I got, you could create a query with having and without group by only in two situations:
You have a where clause, and you want to test a condition on an aggregation of all rows that satisfy that clause.
Same as above, but for all rows in your table (in practice, it doesn't make sense, though).