How to improve Drill Performance - hive

I have one table in hive data warehouse when I am reading that table using drill,using this query-
select row_key,play load,update_time from parsed_table where update_time between '2017-01-01 00:00:00' and '2017-01-01 23:59:59' and web-socket_key='359569050289017/AINU'
It fetches 3,890 rows in 15.288 seconds. Could anyone suggest what should I do to improve query performance.

Related

Get rows inserted during time range in BigQuery

In BiQuery Legacy SQL it's possible to get rows inserted
during a time range.
#legacySQL
SELECT COUNT(*) FROM [PROJECT_ID:DATASET.TABLE#time1-time2]
With standard SQL you can do point-in-time query like this
SELECT *
FROM t
FOR SYSTEM_TIME AS OF '2017-01-01 10:00:00-07:00';
But I don't find a way to query for the time range as in Legacy SQL.

Improve SELECT and UPDATE performance

I'll try to explain my issue, since I'm not using SQL directly.
I'm using INFORMATICA tool by using mappings that process SQL data, so I'll try to explain the logic my map does into SQL.
My map basically select data from an SCD (slowly changing dimension) where start_date = sysdate and ind = 1 (this table has approximately 600mil records) using this query:
SELECT table.ACCOUNT_NUMBER, table.SUB_ACCOUNT_NUMBER, table.SUB_ACCOUNT_KEY
FROM table
WHERE table.CURR_IND=1
AND table.START_DATE=trunc(sysdate)
This table is indexes as following:
SUB_ACCOUNT_KEY - UNIQUE
Then add another column and update a different table that have approximately 8mil records . The query of that is probably update with join by
SET table2.ind =The_New_Column,table_2.sub_account_key = table1.sub_account_key
WHERE Table.account_number = Table_2.account_number
AND table.sub_account_number = table_2.sub_account_number
This table_2 is indexes as following:
(ACCOUNT_NUMBER, SUB_ACCOUNT_NUMBER) - UNIQUE
Both select and update take some time to process depending on the amount of data I get each day(We have 1 day each three month that the amount of data is about X30 of a normal day which take for ever.. about 2 hours)
So, my question is: How can I speed this process up having the following limitation :
I can't (unless given a very good reason) adding an index on the tables since it is being used in many other processes , so it can harm their performances
suggestion 1: create a function based index:
CREATE INDEX index_name
ON table (TRUNC(START_DATE));
as you mentioned, this might not be possible because you can't use indexes.
suggestion 2: use BETWEEN:
SELECT table.ACCOUNT_NUMBER, table.SUB_ACCOUNT_NUMBER, table.SUB_ACCOUNT_KEY
FROM table
WHERE table.CURR_IND=1
AND table.START_DATE BETWEEN TO_DATE('2016.02.14 12:00:00 AM', 'YYYY.MM.DD HH:MI:SS AM')
AND TO_DATE('2016.02.15 11:59:59 PM', 'YYYY.MM.DD HH:MI:SS PM');
(see also http://oraclecoder.com/tutorials/quick-tip-do-not-use-trunc-to-filter-on-a-date-and-time-field--2120)
This is essentially the same question you asked under "get current date fomatted". You are either going to have to modify your sql, or use a function based index. Yes, indexes can cause some additional overhead on DML, but can give dramatic improvement on SELECTs. Like all design decisions, you have weigh the benefit to cost and decide what is more important.

My simple oracle query takes too long to complete

I'm trying to run the following SQL statement that is taking too long to finish in Oracle.
Here is my query:
SELECT timestamp from data
WHERE (timestamp IN
(SELECT MIN (timestamp) FROM data
WHERE (( TIMESTAMP BETWEEN :t1 AND :t2))
If anyone can help with optimising this query I would be very grateful.
All you need to speed your query is an index on timestamp:
create index data_timestamp on data(timestamp);
If you are expecting only one result, you can also do:
SELECT MIN(timestamp)
FROM data
WHERE TIMESTAMP BETWEEN :t1 AND :t2
I'm not sure why you would want to output the timestamp multiple times.

Sql server full text search performance with additional conditions

We have a performance problem with SQL Server (2008 R2) Full text search. When we have additional where conditions to full-text search condition, it gets too slow.
Here is my simplified query:
SELECT * FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')
AND CONTAINS(CustomerText, '("efendim")')
Calls table's primary key is CallId (int, clustered index) and also Calls table indexed by CallTime. We have 16.000.000 rows and CustomerText is about 10KB for each row.
When I see execution plan, first it finds full-text search resultset and then joins with Calls table by CallId. Because of that, if first resultset has more rows, query gets slower (over a minute).
This is the execution plan:
When I run where conditions seperately, it returns 360.000 rows for CallTime condition:
SELECT COUNT(*) FROM Calls C
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')
and 1.200.000 rows for Contains condition:
SELECT COUNT(*) FROM Calls C
WHERE CONTAINS(AgentText, '("efendim")')
So, what can I do to increase performans of my query?
If you have indexed and sorted Calls according to their call time, Instead of calling:
WHERE (C.CallTime BETWEEN '2013-08-01 00:00:00' AND '2013-08-07 00:00:00')
You can find the first index with call time bigger than '2013-08-01 00:00:00' and the last index smaller than the '2013-08-07 00:00:00'
and your new conditional will be:
WHERE (C.CallTime BETWEEN first_index AND last_index)
which is faster than comparing dates.

How to fetch data from oracle database in hourly basis

i have one table in my database say mytable, which contents request coming from other source. There is one column in this table as Time, which stores date and time(e.g. 2010/07/10 01:21:43) when request was received. Now i want to fetch the data from this table on hourly basis for each day. Means i want count of requests database receive in each hours of a day. e.g.for 1 o'clock to 2 o'clock say count is 50 ..like this.. I will run this query at the end of day. So i will get requests received in a day group by each hour.
Can anybody help me in this.
I want query which will take less time to fetch the data as my database size is huge.
Any othre way than OMG Ponies answer.
Use the TO_CHAR function to format the DATETIME column, so you can GROUP BY it for aggregate functions:
SELECT TO_CHAR(t.time, 'YYYY-MM-DD HH24') AS hourly,
COUNT(*) AS numPerHour
FROM YOUR_TABLE t
GROUP BY TO_CHAR(t.time, 'YYYY-MM-DD HH24')
Why don't you create another table that stores the count and the date. Create a database job that will run hourly and put the count and sysdate in the new table. Your code will be just querying the new table.
create table ohtertable_count (no_of_rows number, time_of_count date);
Your database job, that will run hourly will be something like
insert into othertable_count
select count(1), sysdate
from othertable;
And you will query the table othertable_count instead of querying your original table.
SELECT To_char(your_column, 'YYYY-MM-DD HH24') AS hourly,
Count(*) AS numPerHour
FROM your_table
WHERE your_column > '1-DEC-18'
AND your_column < '4-DEC-18'
GROUP BY To_char(your_column, 'YYYY-MM-DD HH24')
ORDER BY hourly;