How to run a nested query depends on a condition - sql

Is it possible to run queries depends on a condition?
I mean,
i have a table with id, score, amt, time.
I have to group by id and has to get max score record for every id,
if two records with same id and score then i has to go for amt, if amts also same then to time.
It is possible to do this in a single query !!
Thanks in advance.

It's possible if you do a self join. However, the fact that you have two records with the same id suggests that your db might not be normalized. If any event, the general idea is this:
select case
when t1.id = t2.id and t1.score = t2.score then t1.amt
else t1.time end fieldalias
from yourtable t1 join yourtable t2 on something
where whatever
However, this will only work if amt and time are the same datatype. Plus I have no idea what field to use to do your self join.

Related

Select row with max value with condition

I have a table that contains results of some sports competition. Here it is:
And I need to get a table with winner teams. It means, from rows with same MatchIds select entries where Score is maxium for these MatchIds.
Result should look like this:
I have no idea of correct SQL query.
I'm using MSSQL Server 2018. Thank you.
One method is a correlated subquery:
select t.*
from t
where t.score = (select max(t2.score) from t t2 where t2.matchid = t.matchid);

How to set updating row's field with value of closest to it by date another field?

I have a huge table with 2m+ rows.
The structure is like that:
ThingName (STRING),
Date (DATE),
Value (INT64)
Sometimes Value is null and I need to fix it by setting it with NOT NULL Value of closest to it by Date row corresponding to ThingName...
And I am totally not SQL guy.
I tried to describe my task with this query (and simplified it a lot by using only previous dates (but actually I need to check future dates too)):
update my_tbl as SDP
set SDP.Value = (select SDPI.Value
from my_tbl as SDPI
where SDPI.Date < SDP.Date
and SDP.ThingName = SDPI.ThingName
and SDPI.Value is not null
order by SDPI.Date desc limit 1)
where SDP.Value is null;
There I try to set updating row Value with one that I select from same table for same ThingName and with limit 1 I leave only single result.
But query editor tell me this:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
Actually, I am not sure at all that my task can be solved just with query.
So, can anyone help me? If this is impossible, then tell me this, if it possible, tell me what SQL constructions may help me.
Below is for BigQuery Standard SQL
In many (if not most) cases you don't want to update your table (as it incur extra cost and limitations associated with DML statements) but rather can adjust 'missing' values in-query - like in below example:
#standardSQL
SELECT
ThingName,
date,
IFNULL(value,
LAST_VALUE(value IGNORE NULLS)
OVER(PARTITION BY thingname ORDER BY date)
) AS value
FROM `project.dataset.my_tbl`
If for some reason you actually need to update the table - above statement will not help as DML's UPDATE does not allow use of analytic functions, so you need to use another approach. For example as below one
#standardSQL
SELECT
t1.ThingName, t1.date,
ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] AS value
FROM `project.dataset.my_tbl` AS t1
LEFT JOIN `project.dataset.my_tbl` AS t2
ON t2.ThingName = t1.ThingName
AND t2.date <= t1.date
GROUP BY t1.ThingName, t1.date, t1.value
and now you can use it to update your table as in example below
#standardSQL
UPDATE `project.dataset.my_tbl` t
SET value = new_value
FROM (
SELECT TO_JSON_STRING(t1) AS id,
ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] new_value
FROM `project.dataset.my_tbl` AS t1
LEFT JOIN `project.dataset.my_tbl` AS t2
ON t2.ThingName = t1.ThingName
AND t2.date <= t1.date
GROUP BY id
)
WHERE TO_JSON_STRING(t) = id
In BigQuery, updates are rather rare. The logic you seem to want is:
select t.*,
coalesce(value,
lag(value ignore nulls) over (partition by thingname order by date)
) as value
from my_tbl;
I don't really see a reason to save this back in the table.

Unable to get duplicate records from table

I have a table with the structure given below:
A User_ID has values for its respective items in the specific time interval. Item value can be text or integer depends upon the item.
I want to check if any Two or more UserId as same values, meaning their items are same with same values and in the same time interval.
As in above table UserId 213456 and UserId 213458 has same records.
I tried using cursor and loops, but it's taking too long. My table has more than 50 million UserId. Is there a way to do this in an efficient way?
I also tried using group by with subqueries but all the attempts were failed to create a good query for it.
I have created the following query using How do I find duplicate values in a table in Oracle?
select t1.USERID, count(t1.USERID)
from USERS_ITEM_VAL t1
where exists ( select *
from USERS_ITEM_VAL t2
where t1.rowid <> t2.rowid and
t2.ITEMID = t1.ITEMID and
t2.TEXT_VALUE = t1.TEXT_VALUE and
--t2.INTEGER_VALUE = t1.INTEGER_VALUE and
t2.INIT_DATE = t1.INIT_DATE and
t2.FINAL_DATE = t1.FINAL_DATE )
group by t1.USERID having count(t1.USERID) > 1 order by count(t1.USERID);
But the problem is its working when excluding the INTEGER_VALUE columns but not giving me output when I include INTEGER_VALUE column in the join, though my data in INTEGER_VALUE column is same.
Here is the structure of my table:
USERID - NUMBER
ITEMID - NUMBER
TEXT_VALUE - VARCHAR2(500)
INTEGER_VALUE - NUMBER
INIT_DATE - DATE
FINAL_DATE - DATE
One way to approach this uses a self join. The idea is to count the number of items that two users have in common (taking the date columns into account). Then compare this to the number of items that each has:
with t as (
select t.*, count(*) over (partition by userid) as numitems
from t
)
select t1.userid, t2.userid
from t t1 join
t t2
on t1.userid < t2.userid and
t1.itemid = t2.itemid and
t1.init_date = t2.init_date and
t1.final_date = t2.final_date and
t1.numitems = t2.numitems
group by t1.userid, t2.userid, t1.numitems
having count(*) = t1.numitems;
The reason your query failed is that either text_value or integer_value will be NULL in every row. For this reason, it's not possible to use an equality predicate in the self-join without using NVL functions to plug the NULL values.
However, below is a query that uses an analytic function to accomplish the goal:
Select * From (
Select t.*, Count(*) Over (Partition By t.itemId,
t.text_value,
t.integer_value,
t.init_date,
t.final_date) as Cnt)
Where cnt > 1;
The query returns all rows where multiple records have identical values in the five columns of the Partition By clause.
A benefit of this technique over the self-join approach is that the table is scanned only once, whereas it would be scanned twice with a self join. This could result in better performance if the table is large.

Explain how this SELECT WHERE subquery works?

Here's the query:
SELECT ID, Name, EventTime, State
FROM mytable as mm Where EventTime IN
(Select MAX(EventTime) from mytable mt where mt.id=mm.id)
Here is the fiddle:
http://sqlfiddle.com/#!3/9630c0/5
It comes from this S.O. question:
Select distinct rows whilst grouping by max value
I would like to hear in plain english how it works. I'm missing some fundamental understanding of part of it.
I don't really understand what the aliases are doing in the mt.id=mm.id part. It selects rows where the id is equal to the id?
The mt.id=mm.id part makes it a correlated subquery, hence the subquery is re-evaluated for each ID.
The query, then, selects the most recent event for each ID.
It is basically translated into "Get me the data for each id with maximum EventTime associated with."
You can also rewrite the code as
SELECT t1.ID, t1.Name, t1.EventTime, t1.State FROM mytable as t1
inner join
(
select id,max(EventTime) as EventTime from mytable group by id
) as t2 on t1.id=t2.id and t1.EventTime=t2.EventTime

SQL Find Possible Duplicates

I need SQL code that will identify possible duplicates in a table. Lets say my table has 4 columns:
ID (primary key)
Date1
Date2
GroupID
(Date1, Date2, GroupID) form a unique key.
This table gets populated with blocks of data at a time, and it often happens that a new block is loaded in that contains a number of records that are already in there. This fine as long as the unique key catches them. Unfortunately, sometimes Date1 is empty (or at least '1900/01/01') either with the first or subsequent uploads.
So what I need is something to identify where the (Date2, GroupID) combination appear more than once and where for one of the records Date1 = '1900/01/01'
Thanks
Karl
bkm kind of has it, but the inner select can perform poorly on some databases.
This is more straightforward:
select t1.* from
t as t1 left join t as t2
on (t1.date2=t2.date2 and t1.groupid=t2.groupid)
where t1.id != t2.id and (t1.date1='1900/01/01' or t2.date2='1900/01/01')
You can identify duplicates on (date2, GroupID) using
Select date2,GroupID
from t
group by (date2,GroupID)
having count(*) >1
Use this to identify records in main table that are duplicates:
Select *
from t
where date1='1900/01/01'
and (date2,groupID) = (Select date2,GroupID
from t
group by (date2,GroupID)
having count(*) >1)
NOTE: Since Date1, Date2, GroupID forms a unique key, check if your design is right in allowing Date1 to be NULL. You could have a genuine case where Date 1 is different for two rows while (date2,GroupID) is the same
If I understand correctly, you are looking for a group of IDs for which GroupID and Date2 are the same, there's one occurance of Date1 that's different from 1900/01/01, and all the rest of the Date1s are 1900/01/01.
If I got it right, here's the query for you:
SELECT T.ID
FROM Table T1
WHERE
(T1.GroupID, T1.Date2) IN
(SELECT T2.GroupID, T2.Date2
WHERE T2.Date1 = '1900/01/01' OR
T2.Date IS NULL
GROUP BY T2.GroupID, T2.Date2)
AND
1 >=
(
SELECT COUNT(*)
FROM TABLE T3
WHERE NOT (T3.Date1 = '1900/01/01')
AND NOT (T3.Date1 IS NULL)
AND T3.GroupID = T1.GroupID
AND T3.Date2 = T1.Date2
)
Hope that helps.
In addition to having a PRIMARY KEY field defined on the table, you can also add other UNIQUE constraints to perform the same sort of thing you're asking for. They'll validate that a particular column or set of columns have a unique value in the table.
Check out the entry in the MySQL manual for an example:
http://dev.mysql.com/doc/refman/5.1/en/create-table.html
A check constraint perhaps.
Something along the lines of select count(*) where date1 = '1900/01/01' and date2 = #date2 and groupid = #groupid.
Just need to see if you can do this in a table-level constraint ....
select * from table a
join (
select Date2, GroupID, Count(*)
from table
group by Date2, GroupID
having count(*) > 1
) b on (a.Date2 = b.Date2 and a.GroupID = b.GroupID)
where a.Date1 = '1900/01/01'
This is the most straightforward way I can think to do it:
SELECT DISTINCT t1.*
FROM t t1 JOIN t t2 USING (date2, groupid)
WHERE t1.date1 = '1900/01/01';
No need to use GROUP BY, which performs poorly on some brands of database.