SQL to find the matching row between two tables of same schema - sql

I have two tables, say X and X_STAGING.
They are exactly identical in columns i.e. schema is same. However, the number of rows are different. I know that the first row of X is there in X_STAGING - the data was partially copied over from X_STAGING to X. However I need to know exactly which row of the X_STAGING contains the data, that went into the first row of X.
At the moment I am using this
SELECT
SUM(MATCH)
FROM
(
SELECT
CASE WHEN X_STAGING.KEY_ID='KEY_FROM_THE_FIRST_ROW_OF_X' THEN 1 ELSE 0 END AS MATCH
FROM
X_STAGING
WHERE ROWNUM<2550000
)
Changing the ROWNUM I can find out at which ROWNUM does the count get to 1. And then my adjusting ROWNUM I can eventually get to the particular row.
This will work, but I am sure there has to be a quicker and more clever way of doing this.
Please help.
Note: I am working on Linux, DB2 environment.

I don't understand what you are trying to accomplish, but the following does what you are asking for:
SELECT
MAX(MATCH)
FROM
(
SELECT
CASE WHEN X_STAGING.KEY_ID='KEY_FROM_THE_FIRST_ROW_OF_X' THEN ROWNUM ELSE 0 END AS MATCH
FROM
X_STAGING
)

Related

Replacing a query with SELECT * FROM X WHERE Y is not NULL

What does this query try to achieve?
SELECT * FROM X WHERE (X.Y in (select Y from X))
As far as I figured, it is yielding me the same result as
SELECT * FROM X WHERE Y is not NULL
Is there anything more to the first query? The first query is actually very slow with a large dataset and hence I want to know whether I can replace it with the second query.
You are right, the two queries are equivalent.
It is unclear, why the first query was written this way. Maybe it looked different once.
As is, your second query is better, because it is easier to read and understand (and even faster as you say).
your second query is perfect than the 1st one
because in 1st query you may get abnormal(null) result in case if column Y contains null value but you will not get abnormal result in 2nd one if null values contain in column Y.
So based on values of your table two query will behave two different way

I need to query a database based on keywords

I need to query a postgresql database where there are keywords stored in the same row as the data I am trying to query. If it is queried on that keyword, that object is more likely, but not guaranteed to be the object queried. I want it to query about 10 items at a time, but I'm pretty sure I know how to do that(select top 10). So basically if the keyword is present it is more likely but not guaranteed to be the object queried. How do I do this?
I have a year of experience as a database developer but I don't know how to solve this problem. I would also be open to switching software if there are better suggestions. Thanks!!
So for example if the user searches on Apples then Data2 is more likely, but not guaranteed to be queried.
You want to select 10 rows, prefering those matching the keyword. So, order by match, then restrict to ten rows:
select *
from mytable
order by
case when keyword1 = 'Apples' then 0 else 1 end +
case when keyword2 = 'Apples' then 0 else 1 end +
case when keyword3 = 'Apples' then 0 else 1 end
fetch first 10 rows only;
Demo: https://dbfiddle.uk/?rdbms=postgres_8.4&fiddle=34758b94fe725f7f51a476e80c97187c
A row with a matching keyword is more likely, but not guaranteed to be selected, because the query picks ten rows, making arbitrary choices in case of ties. The linked demo shows one situation with less than 10 matches and one with more than ten.

What is "Select -1", and how is it different from "Select 1"?

I have the following query that is part of a common table expression. I don't understand the function of the "Select -1" statement. It is obviously different than the "Select 1" that is used in "EXISTS" statements. Any ideas?
select days_old,
count(express_cd),
count(*),
case
when round(count(express_cd)*100.0/count(*),2) < 1 then '0'
else ''
end ||
cast(decimal(round(count(express_cd)*100.0/count(*),2),5,2) as varchar(7)) ||
'%'
from foo.bar
group by days_old
union all
select -1, -- Selecting the -1 here
count(express_cd),
count(*),
case
when round(count(express_cd)*100.0/count(*),2) < 1 then '0'
else ''
end ||
cast(decimal(round(count(express_cd)*100.0/count(*),2),5,2) as varchar(7)) ||
'%'
from foo.bar
where days_old between 1 and 7
It's just selecting the number "minus one" for each row returned, just like "select 1" will select the number "one" for each row returned.
There is nothing special about the "select 1" syntax uses in EXISTS statements by the way; it's just selecting some random value because EXISTS requires a record to be returned and a record needs data; the number 1 is sufficient.
Why you would do this, I have no idea.
When you have a union statement, each part of the union must contain the same columns. From what I read when I look at this, the first statement is giving you one line for each days old value and then some stats for each day old. The second part of the union is giving you a summary of all the records that are only a week or so less. Since days old column is not relevant here, they put in a fake value as a placeholder in order to do the union. OF course this is just a guess based on reading thousands of queries through the years. To be sure, I would need to actually run teh code.
Since you say this is a CTE, to really understand why this is is happening, you may need to look at the data it generates and how that data is used in the next query that uses the CTE. That might answer your question.
What you have asked is basically about a business rule unique to your company. The true answer should lie in any requirements documents for the original creation of the code. You should go look for them and read them. We can make guesses based on our own experience but only people in your company can answer the why question here.
If you can't find the documentation, then you need to talk (Yes directly talk, preferably in person) to the Stakeholders who use the data and find out what their needs were. Only do this after running the code and analyzing the results to better understand the meaning of the data returned.
Based on your query, all the records with days_old between 1 and 7 will be output as '-1', that is what select -1 does, nothing special here and there is no difference between select -1 and select 1 in exists, both will output the records as either 1 or -1, they are doing the same thing to check whether if there has any data.
Back to your query, I noticed that you have a union all and compare each four columns you select connected by union all, I am guessing your task is to get a final result with days_old not between 1 and 7 and combine the result with day_old, which is one because you take all between 1 and 7.
It is just a grouping logic there.
Your query returns aggregated
data (counts and rounds) grouped by days_old column plus one more group for data where days_old between 1 and 7.
So, -1 is just another additional group there, it cannot be 1 because days_old=1 is an another valid group.
result will be like this:
row1: days_old=1 count(*)=2 ...
row2: days_old=3 count(*)=5 ...
row3: days_old=9 count(*)=6 ...
row4: days_old=-1 count(*)=7

Getting a unique value from an aggregated result set

I've got an aggregated query that checks if I have more than one record matching certain conditions.
SELECT RegardingObjectId, COUNT(*) FROM [CRM_MSCRM].[dbo].[AsyncOperationBase] a
where WorkflowActivationId IN ('55D9A3CF-4BB7-E311-B56B-0050569512FE',
'1BF5B3B9-0CAE-E211-AEB5-0050569512FE',
'EB231B79-84A4-E211-97E9-0050569512FE',
'F0DDF5AE-83A3-E211-97E9-0050569512FE',
'9C34F416-F99A-464E-8309-D3B56686FE58')
and StatusCode = 10
group by RegardingObjectId
having COUNT(*) > 1
That's nice, but then there is one field in AsyncOperationBase that will be unique. Say count(*) = 3, well, AsyncOperationBaseId in AsyncOperationBase will have 3 different values since AsyncOperationBase is the table's primary key.
To be honest, I would not even know what terms and expressions to Google to find a solution.
If anyone has a solution and also, is there any words to describe what I'm looking for ? Perhaps BI people are often faced with such a requirement or something...
I could do it with an SSRS report where the report would visually do the grouping then I could expand each grouped row to get the AsyncOperationBaseId value, but simply through SQL, I can't seem to find a way out...
Thanks.
select * from [CRM_MSCRM].[dbo].[AsyncOperationBase]
where RegardingObjectId in
(
SELECT RegardingObjectId
FROM [CRM_MSCRM].[dbo].[AsyncOperationBase] a
where WorkflowActivationId IN
(
'55D9A3CF-4BB7-E311-B56B-0050569512FE',
'1BF5B3B9-0CAE-E211-AEB5-0050569512FE',
'EB231B79-84A4-E211-97E9-0050569512FE',
'F0DDF5AE-83A3-E211-97E9-0050569512FE',
'9C34F416-F99A-464E-8309-D3B56686FE58'
)
and StatusCode = 10
group by RegardingObjectId
having COUNT(*) > 1
)

Selecting top n Oracle records with ROWNUM still valid in subquery?

I have the following FireBird query:
update hrs h
set h.plan_week_id=
(select first 1 c.plan_week_id from calendar c
where c.calendar_id=h.calendar_id)
where coalesce(h.calendar_id,0) <> 0
(Intention: For records in hrs with a (non-zero) calendar_id
take calendar.plan_week_id and put it in hrs.plan_week_id)
The trick to select the first record in Oracle is to use WHERE ROWNUM=1, and if understand correctly I do not have to use ROWNUM in a separate outer query because I 'only' match ROWNUM=1 - thanks SO for suggesting Questions that may already have your answer ;-)
This would make it
update hrs h
set h.plan_week_id=
(select c.plan_week_id from calendar c
where (c.calendar_id=h.calendar_id) and (rownum=1))
where coalesce(h.calendar_id,0) <> 0
I'm actually using the 'first record' together with the selection of only one field to guarantee that I get one value back which can be put into h.plan_week_id.
Question: Will the above query work under Oracle as intended?
Right now, I do not have a filled Oracle DB at hand to run the query on.
Like Nicholas Krasnov said, you can test it in SQL Fiddle.
But if you ever find yourself about to use where rownum = 1 in a subquery, alarm bells should go off, because in 90% of the cases you are doing something wrong. Very rarely will you need a random value. Only when all selected values are the same, a rownum = 1 is valid.
In this case I expect calendar_id to be a primary key in calendar. Therefor each record in hrs can only have 1 plan_week_id selected per record. So the where rownum = 1 is not required.
And to answer your question: Yes, it will run just fine. Though the brackets around each where clause are also not required and in fact only confusing (me).