What would be TTL on records if we open DB multiple times with same TTL? - ttl

As per Time To Live Documentation:
Open1 at t=0 with ttl=4 and insert k1,k2, close at t=2 Open2 at t=3
with ttl=5. Now k1,k2 should be deleted at t>=5
In my use case:
If I open the Rocksdb with ttl = 4 at t = 0
insert k1,k2.
Close the DB at t = 2.
Open DB again at t=4 with ttl=5
insert k1,k3.
What would be ttl of k1,k2 and k3 now? Does TTL of records depend on opening DB multiple times?
I was thinking ttl of all the records (k1,k2,k3) would be changed because we have opened DB multiple times even though with same TTL.

Related

How can I paginate the results of GEOSEARCH?

I am following the tutorial on https://redis.io/commands/geosearch/ and I have successfully migrated ~300k records (from existing pg database) into testkey (sorry for the unfortunate name, but I am testing it out!) key.
However, executing a query to return items with 5km results in 1000s of items. I'd like to limit the number of items to 10 at a time, and be able to load the next 10 using some sort of keyset pagination.
So, to limit the results I am using
GEOSEARCH testkey FROMLONLAT -122.2612767 37.7936847 BYRADIUS 5 km WITHDIST COUNT 10
How can I execute GEOSEARCH queries with pagination?
Some context: I have a postgres + postgis database with ~3m records. I have a service that fetches items within a radius and even with right indexes it is starting to get sluggish. For context, my other endpoints can handle 3-8k rps, while this one can barely handle 1500 (8ms average query exec time). I am exploring moving items into redis cache, either the entire payload or just IDs and run IN query (<1ms query time).
I am struggling to find any articles using google search.
You can use GEOSEARCHSTORE to create a sorted set with the results from your search. You can then paginate this sorted set with ZRANGE. This is shown as an example on the GEOSEARCHSTORE page:
redis> GEOSEARCHSTORE key2 Sicily FROMLONLAT 15 37 BYBOX 400 400 km ASC COUNT 3 STOREDIST
(integer) 3
redis> ZRANGE key2 0 -1 WITHSCORES
1) "Catania"
2) "56.441257870158204"
3) "Palermo"
4) "190.44242984775784"
5) "edge2"
6) "279.7403417843143"
redis>

pypyodbc sql query of cloud stored ms access database is slow when querying newest data but fast when querying oldest data

I'm using pypyodbc and pandas.read_sql_query to query a cloud stored MS Access Database .accdb file.
def query_data(group_id,dbname = r'\\cloudservername\myfile.accdb',table_names=['ContainerData']):
start_time = datetime.now()
print(start_time)
pypyodbc.lowercase = False
conn = pypyodbc.connect(
r"Driver={Microsoft Access Driver (*.mdb, *.accdb)};"+
r"DBQ=" + dbname + r";")
connection_time = datetime.now()-start_time
print("Connection Time: " + str(connection_time))
querystring = ("SELECT TOP 10 Column1, Column2, Column3, Column4 FROM " +
table_names[0] + " WHERE Column0 = " + group_id)
my_data = pd.read_sql_query(querystring,conn)
print("Query Time: " + str(datetime.now()-start_time-connection_time))
conn.close()
return(my_data)
The database has about 30,000 rows. The group_id are sequential numbers from 1 to 3000 with 10 rows assigned to each group. For example, rows 1-10 in the database (oldest date) all have group_id=1. Rows 2990-3000 (newest data) all have group_id = 3000.
When I store the database locally on my PC and run query_data('1') the connection time is 0.1s and the query time is 0.01s. Similarly, running query_data('3000') the connection time is 0.2s and the query time is 0.08s.
When the database is stored on the cloud server, the connection time varies from 20-60s. When I run query_data('1') the query time is ~3 seconds. NOW THE BIG ISSUE: When I run query_data('3000') the query time i ~10 minutes!
I've tried using ORDER BY group_id DESC but that causes both queries to take ~ 10 minutes.
I've also tried changing the "Order by" group_id to Descending in the accdb itself and setting "Order by on load" to yes. Neither of these seem to change how the SQL query locates the data.
The problem is, the code I'm using almost always needs to find the newest data (e.g. group_id = max) which takes the longest amount of time to find. Is there a way to have the SQL query reverse it's searching order, so that the newest entries are looked through first, rather than the oldest entries? I wouldn't mind a 3 second (or even 1 minute) query time, but a 10 minute query time is too long. Or is there a setting I can change in the access database to change the order in which the data is stored?
I've also watched the network monitor while running the script, and python.exe steadily sends about 2kb/s and receives about 25kb/s throughout the full 10 minute duration of the script.

Does BigQuery charge for querying only the stream buffer?

I have a day partitioned table with approx 300k rows in the streaming buffer. When running an interactive, non-cached, standard SQL query using
SELECT .. FROM .. WHERE _PARTITIONTIME IS NULL
The query validator says:
Valid: This query will process 0 B when run.
And after executing, the job information tab says:
Bytes Processed 0 B
Bytes Billed 0 B
The query is certainly returning real-time results each time I run it. Is this actually a free operation?

SQL increment between a range without duplicates - multithreaded

I'll try to provide as much detail as I can on this. What is the best way to increment between a range of numbers in a high transaction environment, meaning, calls come in very fast from a web api?
The first table I have is a master table of the ranges that are available to use. It's noted below. I'm not quote sure if this is the best way to implement it either, so I'm open to suggestions
Company A initially gave me a range of 100-200. Over time, we started to run low so they gave us a new range. The new range is 201-300.
Company Range Inactive
A 100-200 X
B 100-200
C 200-350
A 201-300
The second table is the list of numbers that have been used between the ranges.
Company Number DateUsed
A 198 2017-11-30
B 199 2017-11-30
A 200 2017-11-30
B 105 2017-11-30
C 215 2017-11-30
A 201 2017-11-30
Once a range is used up, I need to be able to flag out that range so it's not used anymore, and use the next range that's available. I was thinking of adding a "Last Used" number to the first table and doing an Update statement with an output with a case statement on inactive flagging it inactive if its empty.
The question I have is what is the best way to do this in a high transaction environment? I'm familiar with Scope_Identity, but I don't think this will work in this setup.
Having an inactive flag in the first table seems perfectly sane to me. I've written and tested a query that will update your flag, provided the Range column is split into a lower range column and upper range column. I'm calling the tables Ranges and RangeLog respectively.
UPDATE Ranges
SET Inactive = 'X'
WHERE EXISTS (SELECT tA.Company, RangeLow, RangeHigh, COUNT(*)
FROM Ranges tA INNER JOIN (SELECT DISTINCT Company, Number
FROM RangeLog) tB ON tA.Company = tB.Company AND (Number BETWEEN tA.RangeLow AND tA.RangeHigh)
GROUP BY tA.Company, RangeLow, RangeHigh
HAVING RangeHigh - RangeLow + 1 = COUNT(*)
AND Ranges.Company = tA.Company AND Ranges.RangeLow = tA.RangeLow AND Ranges.RangeHigh = tA.RangeHigh)
Obviously this won't work with the schema the way you have it now, but making the split to the Range lookup column makes the data more atomic and usable, and easier to write queries. And it's a fairly minor modification to your table.
Let us know how this looks to you!

Removing several tables out of a sql server 2005 database - make a new db or use old?

I'm killing a bunch of tables in a SQL Server db to move to an archive db. The current db has a couple of filegroups and has been working okay growing the tables that are still there. I'll be removing multiple gigabytes, though.
Should I make a new db and move the current tables in there? I'm paranoid about not setting growth right.
There is really only one table that sees a lot of activity - and that goes up to about 14,000 rows in a four-month
It's up to what you really want to do. If you use a new db then your existing applications may have to change the connection strings to reflect the new db name.
If you are worried about the growth setting, make sure you use a good number that is close to projected growth numbers and look for the autogrow events for data and log files using the default trace. Query is given below for you. No one gets the size correctly and you make your best guess based on the data available to you and monitor the growth. If you see any data back then make the changes appropriately to bump the numbers. And also 14000 rows in 4 month period is NOT considered active at all when compared to what SQL Server can handle.
DECLARE #filename VARCHAR(255)
SELECT #FileName = SUBSTRING(path, 0, LEN(path)-CHARINDEX('\', REVERSE(path))+1) + '\Log.trc'
FROM sys.traces
WHERE is_default = 1;
--Check if the data and log files auto-growed.
SELECT
gt.ServerName
, gt.DatabaseName
, gt.TextData
, gt.StartTime
, gt.Success
, gt.HostName
, gt.NTUserName
, gt.NTDomainName
, gt.ApplicationName
, gt.LoginName
FROM [fn_trace_gettable](#filename, DEFAULT) gt
JOIN sys.trace_events te ON gt.EventClass = te.trace_event_id
JOIN sys.databases d ON gt.DatabaseName = d.name
WHERE EventClass in ( 92, 93 ) --'Data File Auto Grow', 'Log File Auto Grow'
ORDER BY StartTime;