max function in R not giving desired Date results - sql

I am trying to use Max function to get the record with latest dates but it does not give desired results as it also gives data with old dates.Below is output of dataframe OA_Output:
Org_ID ORG_OFFICIAL_NORM_NAME ORG_IMMEDIATE_PARENT_SRC ORG_IP_SRC_MD_Date
-------- ---------------------- ------------------------ ------------------
132693 BOLLE INCORPORATED abc.com 26-JUN-18
122789 BEE STINGER, LLC aa.com 12-Mar-18
344567 CALIBER COMPANY xyz.com 16-Feb-16
639876 Maruti yy.com 23-Jun-17
I am running below R Code to get the records with latest dates:
gautam1 <-
sqldf(
"
SELECT ORG_OFFICIAL_NORM_NAME,ORG_IMMEDIATE_PARENT_SRC
,MAX(ORG_IP_SRC_MD_DATE),ROW_ID
FROM OA_output
where ROW_ID = 1
and ORG_IMMEDIATE_PARENT_SRC like '%exhibit21%'
GROUP BY ORG_IMMEDIATE_PARENT_ORGID
ORDER BY ORG_IMMEDIATE_PARENT_ORGID
" )
In The above code max function is not giving desired results.I am not sure whether there is a date format issue between Oracle and R.
Any help will be appreciated
Thanks
Gautam

can you please use below query as a sql and run again code
SELECT ORG_OFFICIAL_NORM_NAME,ORG_IMMEDIATE_PARENT_SRC
,MAX(ORG_IP_SRC_MD_DATE),ROW_ID
FROM OA_output
where ROW_ID = 1
and ORG_IMMEDIATE_PARENT_SRC like '%exhibit21%'
GROUP BY ORG_OFFICIAL_NORM_NAME,ORG_IMMEDIATE_PARENT_SRC,ROW_ID
ORDER BY ORG_IMMEDIATE_PARENT_ORGID

Related

Why I cant use the column name in the alias when i opered with dates

Currently I am migrating a database from SQL_SERVER to SPARK using HIVE_SQL.
I had an issue when im trying to pass a number to a date format.I found the answer is:
from_unixtime(unix_timestamp(cast(DATE as string) , 'dd-MM-yyyy'))
When I execute this query it bring me the data, notice that iI put an alias different to the name of column FECHA :
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(FECHA AS STRING ) ,'yyyyMMdd'), 'yyyy-MM-dd') AS FECHA_1
FROM reportes_hechos_avisos_diarios
LIMIT 1
| FECHA_1 |
| -------- |
| 2019-01-01 |
But when I put the same alias as the column name it bring me an incosistent information:
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP(CAST(FECHA AS STRING ) ,'yyyyMMdd'), 'yyyy-MM-dd') AS FECHA
FROM reportes_hechos_avisos_diarios
LIMIT 1
| FECHA |
| -------- |
| 2.019 |
I know the trivial answer is , put an alias that doesnt be the same as the column name, but i have an implementation in Tableau that feeds from this query and Its complicated to change this columns because basically i must change all implementation so I need to preserve the column name.This query works for me in SQL SERVER, but i dont know why doesnt works in hive.
Issue
ExpectedResult
PSDT:Thanks for your attention, this is the first question I ask in stack and my native language is not English, sorry if I had grammatical errors.
limit 1 without order by can produce non-deterministic results from run to run because the order of rows is random due to parallel execution, some factors may affect it somehow but getting the same row is not guaranteed.
What is happening - I guess you receiving different row and the date is corrupted in that row, this is why some weird result is returned.
Also, you can another method of conversion:
select date(regexp_replace(cast(20200101 as string),'(\\d{4})(\\d{2})(\\d{2})','$1-$2-$3')) --put your column instead of constant.
Result:
2020-01-01

How to get the set size, first and last record in a db2 ordered set with one call

I have a very big transaction table on DB2 v11, and I need to query a subset of it as efficiently as possible. All I need is the total count of the set (not known in advance, it's based on criteria, lets say 1 day) and the ID of the first record, and the ID of the last record.
The old code was fetching the entire table, then just using the 1st record ID, and the last record ID, and size, and not making use of the rest. Now this code is timing out. It's a complex query of several joins.
IS there a way to just fetch the size of the set, 1st record, last record all in one select query ?
I've read that reordering the list in order to fetch the 1st record(so fetch with Desc, then change to Asc) is not efficient.
sample table 1 TRANSACTION_RECORDS:
tdID TIMESTAMP name
-------------------------------
123 2020-03-31 john
234 2020-03-31 dan
456 2020-03-01 Eve
675 2020-04-01 joy
sample table 2 TRANSACTION_TYPE:
invoiceId tdID account
------------------------------
897 123 abc
898 123 def
877 234 mnc
899 456 opp
Sample query
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
group by tr.tdID
order by TR.tdID ASC
This results in multiple columns, (but it requires the group by)
123,123
234,234
456,456
What I want is:
123,456
As I mentioned in the comments, for this query you don't need Group BY and neither Order by, just do:
select Min(tr.transaction_id), Max(tr.transaction_id)
from TRANSACTION_RECORDS TR
join TRANSACTION_TYPE TT
on TR.tdID=tt.tdID
WHERE Date(TR.TIMESTAMP) = '2020-03-31'
It should work as expected

oracle - sql query select max from each base

I'm trying to solve this query where i need to find the the top balance at each base. Balance is in one table and bases are in another table.
This is the existing query i have that returns all the results but i need to find a way to limit it to 1 top result per baseID.
SELECT o.names.name t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY o.names.name, t.accounts.bidd.baseID;
accounts is a nested table.
this is the output
Name accounts.BIDD.baseID MAX(T.accounts.BALANCE)
--------------- ------------------------- ---------------------------
Jerard 010 1251.21
john 012 3122.2
susan 012 3022.2
fin 012 3022.2
dan 010 1751.21
What i want the result to display is calculate the highest balance for each baseID and only display one record for that baseID.
So the output would look only display john for baseID 012 because he has the highest.
Any pointers in the right direction would be fantastic.
I think the problem is cause of the "Name" column. since you have three names mapped to one base id(12), it is considering all three records as unique ones and grouping them individually and not together.
Try to ignore the "Name" column in select query and in the "Group-by" clause.
SELECT t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY t.accounts.bidd.baseID;

GROUP BY and aggregate function query

I am looking at making a simple leader board for a time trial. A member may perform many time trials, but I only want for their fastest result to be displayed. My table columns are as follows:
Members { ID (PK), Forename, Surname }
TimeTrials { ID (PK), MemberID, Date, Time, Distance }
An example dataset would be:
Forename | Surname | Date | Time | Distance
Bill Smith 01-01-11 1.14 100
Dave Jones 04-09-11 2.33 100
Bill Smith 02-03-11 1.1 100
My resulting answer from the example above would be:
Forename | Surname | Date | Time | Distance
Bill Smith 02-03-11 1.1 100
Dave Jones 04-09-11 2.33 100
I have this so far, but access complains that I am not using Date as part of an aggregate function:
SELECT Members.Forename, Members.Surname, Min(TimeTrials.Time) AS MinOfTime, TimeTrials.Date
FROM Members
INNER JOIN TimeTrials ON Members.ID = TimeTrials.Member
GROUP BY Members.Forename, Members.Surname, TimeTrials.Distance
HAVING TimeTrials.Distance = 100
ORDER BY MIN(TimeTrials.Time);
IF I remove the Date from the SELECT the query works (without the date). I have tried using FIRST upon the TimeTrials.Date, but that will return the first date which is normally incorrect.
Obviously putting the Date as part of the GROUP BY would not return the result set that I am after.
Make this task easier on yourself by starting with a smaller piece of the problem. First get the minimum Time from TimeTrials for each combination of MemberID and Distance.
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance;
Assuming that SQL is correct, use it in a subquery which you join back to TimeTrials again.
SELECT tt2.*
FROM
TimeTrials AS tt2
INNER JOIN
(
SELECT
tt.MemberID,
tt.Distance,
Min(tt.Time) AS MinOfTime
FROM TimeTrials AS tt
GROUP BY
tt.MemberID,
tt.Distance
) AS sub
ON
tt2.MemberID = sub.MemberID
AND tt2.Distance = sub.Distance
AND tt2.Time = sub.MinOfTime
WHERE tt2.Distance = 100
ORDER BY tt2.Time;
Finally, you can join that query to Members to get Forename and Surname. Your question shows you already know how to do that, so I'll leave it for you. :-)

JavaDB: get ordered records in the subquery

I have the following "COMPANIES_BY_NEWS_REPUTATION" in my JavaDB database (this is some random data just to represent the structure)
COMPANY | NEWS_HASH | REPUTATION | DATE
-------------------------------------------------------------------
Company A | 14676757 | 0.12345 | 2011-05-19 15:43:28.0
Company B | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company C | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company A | -7874564 | 0.12345 | 2011-05-19 15:43:28.0
One news_hash may relate to several companies while a company can relate to several news_hashes as well. Reputation and date are bound to the news_hash.
What I need to do is calculate the average reputation of last 5 news for every company. In order to do that I somehow feel that I need to user 'order by' and 'offset' in a subquery as shown in the code below.
select COMPANY, avg(REPUTATION) from
(select * from COMPANY_BY_NEWS_REPUTATION order by "DATE" desc
offset 0 rows fetch next 5 row only) as TR group by COMPANY;
However, JavaDB allows neither ORDER BY, nor OFFSET in a subquery. Could anyone suggest a working solution for my problem please?
Which version of JavaDB are you using? According to the chapter TableSubquery in the JavaDB documentation, table subqueries do support order by and fetch next, at least in version 10.6.2.1.
Given that subqueries can be ordered and the size of the result set can be limited, the following (untested) query might do what you want:
select COMPANY, (select avg(REPUTATION)
from (select REPUTATION
from COMPANY_BY_NEWS_REPUTATION
where COMPANY = TR.COMPANY
order by DATE desc
fetch first 5 rows only))
from (select distinct COMPANY
from COMPANY_BY_NEWS_REPUTATION) as TR
This query retrieves all distinct company names from COMPANY_BY_NEWS_REPUTATION, then retrieves the average of the last five reputation rows for each company. I have no idea whether it will perform sufficiently, that will likely depend on the size of your data set and what indexes you have in place.
If you have a list of unique company names in another table, you can use that instead of the select distinct ... subquery to retrieve the companies for which to calculate averages.