I have below query
SELECT MAX(C.EFCTV_DT) FROM lk1 C,lk2 B WHERE
C.MKT_cd = B.MKT_cd AND C.RC_TYPE_CD = 'SYAS' AND
C.TIER_CD = B.TIER_CD AND C.EFCTV_DT <= '2016-02-02'
I am trying to fetch greatest date less than the given date '2016-02-02'. Hive doesn't support Max. Advise pls?
If i want to find the MAX reldate less than than some date, I will use this
select reldate from Base where reldate < '2014-11-09' ORDER BY reldate LIMIT 1.
This may not be the best solution , An alternative way is to write a Give UDF which is very simple.
Related
I'm quite new to SQL & big query so this might be simple. I'm running some queries on the public dataset GDELT in BQ and have a question regarding the LIMIT. GDELT is massive (14.4 TB) and when I query for something, in this case a person, I could get up to 100k rows of results or more which is this case is too much. But when I use LIMIT it seems like it does not really partition the results evenly over the dates, causing me to get very random timelines. How does limit work and is there a way to get the results more evenly based on days?
SELECT DATE,V2Tone,DocumentIdentifier as URL, Themes, Persons, Locations
FROM `gdelt-bq.gdeltv2.gkg_partitioned`
WHERE DATE>=20210610000000 and _PARTITIONTIME >= TIMESTAMP(#start_date)
AND DATE<=20210818999999 and _PARTITIONTIME <= TIMESTAMP(#end_date)
AND LOWER(DocumentIdentifier) like #url_topic
LIMIT #limit
When running this query and doing some preproc, I get the following time series:
It's based on 15k results, but they are distributed very unevenly/randomly across the days (since there are over 500k results in total if I don't use limit). I would like to make a query that limits my output to 15k but partitions the data somewhat equally over the days.
you need to order by , when you are not sorting your result , the order of returned result is not guaranteed:
but if you are looking to get the same number of rows per day , you can use window functions:
select * from (
SELECT
DATE,
V2Tone,
DocumentIdentifier as URL,
Themes,
Persons,
Locations,
row_number() over (partition by DATE) rn
FROM `gdelt-bq.gdeltv2.gkg_partitioned`
WHERE
DATE >= 20210610000000 AND DATE <= 20210818999999
and _PARTITIONDATE >= #start_date and _PARTITIONDATE <= #end_date
AND LOWER(DocumentIdentifier) like #url_topic
) t where rn = #numberofrowsperday
if you are passing date only you can use _PARTITIONDATE to filter the partitions.
So this my table structure and data.
Now I want to filter data based on Month by ExpenseDate column.
So how can I achieve that?
I was trying
select * from tblExpenses where (ExpenseDate = MONTH('April'))
But it throws an error: "Conversion failed when converting date and/or time from character string."
Please help. Thank you.
You are putting month() on the wrong column. It can be applied to ExpensesDate:
select *
from tblExpenses
where month(ExpenseDate) = 4;
Note that month() returns a number, not the name of the month.
I think it is more likely that you want records from a particular April, not every April. This would be expressed as:
where ExpenseDate >= '2018-04-01' and ExpenseDate < '2018-05-01'
I think your where clause is just reversed I think you want this (and change the word to a number)
select * from tblExpenses where Month(ExpenseDate) = 4
I'm struggling to get the correct result with this query:
select max(kts.my_date), kts.name
join ktt on ktt.someId = kts.someOtherId
where ktt.someId = 'example'
group by kts.name;
I have two (possibly stupid) questions:
Will this max() take time into account? I know that order by does if the dates are the same. Does max do the same?
This is connected to my previous question, but when I run the query above, if the dates are same, it orders it by the name. I want the latest date at the top. Do I need to put an order by clause for the date in? If so, using Max is pointless, right?
Thanks for the help.
Yes,
--2
select max(kts.my_date) over (partition by kts.name) as maxdate, kts.name
from -- chose your table
join ktt on ktt.someId = kts.someOtherId
where ktt.someId = 'example'
order by --chose here your column
give this a try
I have below information in table and want to retrive the count if difference between two dates is >= 1.
Id testdate exdate
1 20120502 20120501 --> This should included, because diff is 1
2 20120601 20120601 --> This should not included, because diff is 0
3 20120704 20120703 --> This should included, because diff is 1
4 20120803 20120802 --> This should included, because diff is 1
Based on the above data, my select count should return 3.
I am trying the following, but it's not giving any results:
select count(to_char(testdate,'YYYYMMDD')-to_char(exdate,'YYYYMMDD')) from test ;
select count(*)
from my_table
where testdate <> exdate
You really should convert those to a date data-type though... it saves a lot of problems in the long run.
Your query will give you results. It will return 4. It gives you results because as long as the result of testdate - exdate is not null it will return a value for that row.
However, as you're not using dates Oracle will most probably convert those to numbers, which won't help for date comparisons should you do that in the future.
20120901 - 20120831 = 70 -- not 1
Okay, from your comment:
Working with ,if i use down voteaccept select count(*) from test where
to_char((testdate,'YYYYMMDD') - to_char(exdate,'YYYYMMDD')) >= 1; .But
count is one of the column.how to retrive above select statement as
one of the column
you're trying something completely different.
Your dates are actually dates; it's helpful to post this. You're looking for an analytic function, specifically count().
select a.*, count(*) over ( partition by 1 ) as ct
from my_table a
where trunc(exdate) <> trunc(testdate)
Note the trunc function, which, without additional parameters will remove the time portion of the date this enabling a direct comparison without resorting to converting the date to a character.
select count(*)
from test
where to_date(testdate,'YYYYMMDD') - to_date(exdate,'YYYYMMDD') >= 1;
or
select count(*)
from test
where to_date(testdate,'YYYYMMDD') <> to_date(exdate,'YYYYMMDD');
Looking at testdate and exdate it looks more like the columns are VARCHAR type so you would require apropriate date conversion.
In Oracle if the type is date you can calculate with them. 1 equal 1 day. 1/24 equals 1 hour.
Your case is rather easy because you could even compare the strings.
SELECT count(*)
FROM test
WHERE testdate <> exdate
But it sounds like you want to be able to be variable, so you rather convert them to a date and then you can do
SELECT count(*)
FROM test
WHERE to_date(testdate,'YYYYMMDD')-to_date(exdate,'YYYYMMDD') >= 1
I am not sure what you want if testdate minus exdate is -1 or more because the exdate is after testdate. Then you can work with ABS
SELECT count(*)
FROM test
WHERE ABS(to_date(testdate,'YYYYMMDD')-to_date(exdate,'YYYYMMDD')) >= 1
I have a fairly large table in which one of the columns is a date column. The query I execute is as follows.
select max(date) from tbl where date < to_date('10/01/2010','MM/DD/YYYY')
That is, I want to find the cell value closest to and less than a particular date value. This takes considerable time because of the max on the large table. Is there a faster way to do this? maybe using LAST_VALUE?
Put an index on the date column and the query should be plenty fast.
1) Add an index to the date column. Simply put, an index allows the database engine to store information about the data so it will speed up most queries where that column is one of the clauses. Info here http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm
2) Consider adding a second clause to the query. You have where date < to_date('10/01/2010','MM/DD/YYYY') now, why not change it to:
where date < to_date('10/01/2010','MM/DD/YYYY') and date > to_date('09/30/2010', 'MM/DD/YYYY')
since this will reduce the number of scanned rows.
Try
select date from (
select date from tbl where date < to_date('10/01/2010','MM/DD/YYYY') order by date desc
) where rownum = 1