Join results from distinct query with other results in oracle - sql

I have a requirement where I need to select the most recent occurrences of unique_customer_id and message_id, no more than one customer row per message id. I also need to return the associated data for channel and reason_code, but these can and will be duplicate data. I also have a unique_row_id i can use too if i need to. how can I add those 2 fields to my current query? (or do it some other way all together)
SELECT DISTINCT unique_customer_id, message_id, MAX(date)
FROM Table1
GROUP BY unique_customer_id, message_id

If you need to add two more columns, and you know they will duplicate the rows (because for unique_customer_id, message_id columns combination there will be more than 1 value in channel and reason_code), you have use an aggregate function on those columns. The question is, which one? How will you determine which channel and which reason code should be selectd?
Example:
SELECT DISTINCT unique_customer_id, message_id, MAX(channel), MAX(reason_code), MAX(date)
FROM Table1
GROUP BY unique_customer_id, message_id;

Related

Return All Historical Account Records for Accounts with Change in Corresponding Value

I am trying to select all records in a time-variant Account table for each account with a change in an associated value (e.g. the maturity date). A change in the value will result in the most recent record for an account being end-dated and a new record (containing a new effective date of the following day) being created. The most recent records for accounts in this table have an end-date of 12/31/9000.
For instance, in the below illustration, account 44444444 would not be included in my query result set since it hasn't had a change in the value (and thus also has no additional records aside from the original); however, the other accounts have multiple changes in values (and multiple records), so I would want to see those returned.
I've tried using the row_num function, as well as a reflexive join, but for some reason I'm not getting the expected results. What are some ways to obtain the results I need?
Note: The primary key for this table includes the acct_id and eff_dt. Also, I'm using PostgreSQL in a Greenplum environment.
Here are two types of queries I tried to use but which produced problematic results:
Query 1
Query 2
If you want only the accounts, use aggregation:
select acct_id
from t
group by acct_id
having min(value) <> max(value);
Based on your description, you could also use count(*) >.
If you want the original records, you can use window functions:
select t.*
from (select t.*, count(*) over (partition by acct_id) as cnt
from t
) t
where cnt > 1;

Having an issue with selecting max rows by date

I am trying to select the max timestamped records from table 1 based on some data from table 2. I am getting the correct records based on the where limits I have put on the query, but I am still getting duplicate entries not the max time stamped entries. Any ideas on what is wrong with the query?
Basically the ID 901413368 has access to certain leveltypes and I'm trying to find out what the max dated requests were that were put in for that same person for the leveltypes that person manages.
SELECT
MAX(timestamp) AS maxtime, Leveltype, assign_ID
FROM
WHERE
(leveltype IN
(SELECT leveltype FROM dbo.idleveltypes WHERE (id = 901413368)))
GROUP BY timestamp, assign_ID, leveltype
HAVING (assign_ID = '901413368')
UPDATE: The issue has been resolved by WEI_DBA's response below:
Remove the timestamp column from your Group By. Also put the assign_ID in the Where Clause and remove the Having clause
The following may be what you want. It should also be a simpler way to write the query:
SELECT MAX(a.timestamp) AS maxtime, a.Leveltype, a.assign_ID
FROM dbo.q_Archive a JOIN
dbo.idleveltypes lt
ON a.leveltype = lt.leveltype AND
a.assign_ID = lt.id
WHERE assign_ID = 901413368
GROUP BY assign_ID, leveltype;
Notes:
Filter on assign_ID before doing the group by. That is much more efficient.
A JOIN is the more typical way to represent the relationship between two tables.
The JOIN condition should be on all the columns needed for matching; there appear to be two.
I don't understand why the leveltype table would have a column called id, but this is your data structure.
The GROUP BY does not need timestamp.
Decide on the type for the id column that should be 901413368. Is it a number or a string? Only use single quotes for string and date constants.
Remove timestamp from GROUP BY clause due you're getting MAX(timestamp)
You shoud not add aggregated fields to GROUP BY clause.
SELECT
MAX(timestamp) AS maxtime,
Leveltype,
assign_ID
FROM
dbo.q_Archive
WHERE
(leveltype IN (SELECT leveltype
FROM dbo.idleveltypes
WHERE (id = 901413368)))
GROUP assign_ID, leveltype
HAVING (assign_ID = '901413368')

Order By clause on an aggregated SQL request

let's say my table schema is like bellow (it's only a simplified example):
MyTable (table name)
ID - int (unique, auto increment)
Message - string
Timestamp - Datetime
I want to select the number of ID, group them by message and order them by timestamp, so I'll do something like this:
SELECT count (ID), Message FROM MyTable
GROUP BY (Message)
ORDER BY Timestamp desc
However, SQL Server management studio throws me this error:
Column 'Timestamp ' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
The problem is that if I put Timestamp in the Group By statement with Message, it messes up my grouping. The other suggestion to put Timestamp in an aggregate function doesn't make sense (ordering by say, count(Timestamp) doesn't mean anything...)
Any idea on how to do this?
Thanks a lot!
Looking for something like this?
SELECT Message, count (ID), max(Timestamp) as maxDate FROM MyTable
GROUP BY (Message)
ORDER BY maxDate desc
When you do aggregation, you are GROUPing rows together based on certain criteria. This mean that each row of your result set actually represents multiple rows in the raw data.
When you want to ORDER BY Timestamp, there will be MULTIPLE timestamp values for each row in the result set, since each one of those rows represents several rows of data.
So, you need to decide which timestamp you want for each set. The MAX? The MIN? You will need to aggregate that field as well to get accurate or meaningful results.
Let's say the same message is in your table multiple times:
1|the mackerel likes frying|1/1/1917
2|at night all cats are grey|12/15/1956
3|the mackerel likes frying|2/2/1918
And you want to group by the message-string, counting the number of times the message appears in the table:
the mackerel likes frying|2
at night all cats are grey|1
The timestamp column is NOT part of the aggregation aka the grouping, but is part of the detail row. It CANNOT appear in the grouping, because timestamp is not "it" (singular) but timestamps, they, plural. There are two different timestamps in the example above for the mackerel message. Which one would you choose? How would the query know which one it was? All you have at your disposal are the aggregate functions:
min(timestamp)
max(timestamp)
count(timestamp)
and if it were other than a datetime, you'd also have AVG(timestamp).
If you want to order the messages based on the max timestamp within the Group then try:
SELECT count (ID), Message
FROM MyTable
GROUP BY (Message)
ORDER BY MAX(Timestamp) DESC
The problem here is that you probably have multiple messages that are the same, but with different timestamps, because you're grouping by message. If you have two messages 'hello' with different timestamps, which should it use for the order by?
This is one way. You could also do a trick with cross apply or row_number.
SELECT count(ID), Message FROM MyTable
GROUP BY (Message)
ORDER BY Max(Timestamp) desc

Using a DISTINCT clause to filter data but still pull other fields that are not DISTINCT

I am trying to write a query in Postgresql that pulls a set of ordered data and filters it by a distinct field. I also need to pull several other fields from the same table row, but they need to be left out of the distinct evaluation. example:
SELECT DISTINCT(user_id) user_id,
created_at
FROM creations
ORDER BY created_at
LIMIT 20
I need the user_id to be DISTINCT, but don't care whether the created_at date is unique or not. Because the created_at date is being included in the evaluation, I am getting duplicate user_id in my result set.
Also, the data must be ordered by the date, so using DISTINCT ON is not an option here. It required that the DISTINCT ON field be the first field in the ORDER BY clause and that does not deliver the results that I seek.
How do I properly use the DISTINCT clause but limit its scope to only one field while still selecting other fields?
As you've discovered, standard SQL treats DISTINCT as applying to the whole select-list, not just one column or a few columns. The reason for this is that it's ambiguous what value to put in the columns you exclude from the DISTINCT. For the same reason, standard SQL doesn't allow you to have ambiguous columns in a query with GROUP BY.
But PostgreSQL has a nonstandard extension to SQL to allow for what you're asking: DISTINCT ON (expr).
SELECT DISTINCT ON (user_id) user_id, created_at
FROM creations
ORDER BY user_id, created_at
LIMIT 20
You have to include the distinct expression(s) as the leftmost part of your ORDER BY clause.
See the manual on DISTINCT Clause for more information.
If you want the most recent created_at for each user then I suggest you aggregate like this:
SELECT user_id, MAX(created_at)
FROM creations
WHERE ....
GROUP BY user_id
ORDER BY created_at DESC
This will return the most recent created_at for each user_id
If you only want the top 20, then append
LIMIT 20
EDIT: This is basically the same thing Unreason said above... define from which row you want the data by aggregation.
The GROUP BY should ensure distinct values of the grouped columns, this might give you what you are after.
(Note I'm putting in my 2 cents even though I am not familiar with PostgreSQL, but rather MySQL and Oracle)
In MySql
SELECT user_id, created_at
FROM creations
GROUP BY user_id
ORDER BY user_id
In Oracle sqlplus
SELECT user_id, FIRST(created_at)
FROM creations
GROUP BY user_id
ORDER BY user_id
These will give you the user_id followed by the first created_at associated with that user_id. If you want a different created_at you have the option to substitute FIRST with other functions like AVG, MIN, MAX, or LAST in Oracle, you can also try adding ORDER BY on other columns (including ones that are not returned, to give you a different created_at.
Your question is not well defined - when you say you need also other data from the same row you are not defining which row.
You do say you need to order the results by created_at, so I will assume that you want values from the row with min created_at (earliest).
This now becomes one of the most common so SQL questions - retrieving rows containing some aggregate value (MIN, MAX).
For example
SELECT user_id, MIN(created_at) AS created_at
FROM creations
GROUP BY user_id
ORDER BY MIN(create_at)
LIMIT 20
This approach will not let you (easily) pick other values from the same row.
One approach that will let you pick other values is
SELECT c.user_id, c.created_at, c.other_columns
FROM creations c LEFT JOIN creation c_help
ON c.user_id = c_help.user_id AND c.created_at > c_help.create_at
WHERE c_help IS NULL
ORDER BY c.created_at
LIMIT 20
Using a sub-query was suggested by someone on the irc #postgresql channel. It worked:
SELECT user_id
FROM (SELECT DISTINCT ON (user_id) * FROM creations) ss
ORDER BY created_at DESC
LIMIT 20;

SQL Query to find if different values exist for a column

I have a temporary table with three columns
pay_id,
id_client_grp,
id_user
Basically i want to ensure that this table should have all the rows having same client group and same id_user if not i want to know which pay_id is the culprit and throw error to user.
Can somebody help me with a query.
Thanks,
Rishi
When you say 'culprit,' I assume you mean the pay_id(s) that are not like the others, assuming there is a majority.
The problem is all of the pay_id's could potentially become culprits once your SELECT COUNT(DISTINCT id_client_grp, id_user) returns > 1 record, if there is a relatively even distribution. It is difficult to program for this scenario, since you will need to determine what exactly a majority is.
Your best bet will be to return all distinct combinations of those 3 fields, then decide where to go from there based on your business logic.
So could this question be asked like this:
If I wanted to add a unique index on my table across the three columns: client group, id user, pay id, identify those that break the unique condition where we have non unique pay id for a client group and id user??
select a.id_client_grp, a.id_user, a.pay_id , a.count from (
/* this should return 1 row per client group and user, */
/* if the pay id is the same for all */
select id_client_grp, id_user, pay_id, count(1) as count
from table t
group by id_client_grp, id_user ) a
group by a.id_client_grp, a.id_user
/* if we have more than one row per client group and user, then we have a dupe, so report them all */
having count (1) > 1
If you want all the rows to have the same values for some set of columns (your question is not entirely clear to me as t9o what you want to be the same)
Do you know going in WHICH pay_id, id_client_grp all the rows should be? Or do you not care, as long as they are all the same?
If you know the values you are looking for, simply test for rows that are not set to those desired values
Select distinct id_user
From tempTable
Where pay_id <> #PayIdValue
Or id_client_grp <> #ClientGroupIDValue
If you don't care, and just want them all to be the same, and they're not, then you need to specify which of the more than one set of values IS the "culprit" as you said...
If you want some other question answered. please explain more clearly...
Based on yr comment, then, to determine if there is more than one id_client_grp, pay_id
Select Count(Distinct id_client_grp, pay_id)
From tempTable
If this = 1 then every record has the same values for these 2 fields.... Any other value indicates that three is more than one set of distinct values in the table.
SELECT DISTINCT p.pay_id,
t.[count]
FROM rishi_table p
INNER JOIN ( SELECT id_client_grp, id_user, COUNT(*) As 'count'
FROM rishi_table
GROUP BY id_client_grp, id_user
HAVING COUNT(*) > 1 ) t
ON p.id_client_grp = t.id_client_grp AND p.id_user = t.id_user
basically create a set with the dupes, and bounce that against the main table to get your offending list.
SELECT DISTINCT id_client_grp, id_user
should let you do something like
IF ##ROWCOUNT > 1 THEN
...
Or possibly SELECT COUNT(DISTINCT id_client_grp, id_user) ...
but that's more vendor-dependent as to its availability and proper syntax.