Currently I am backing up my Derby Database using the SYSCS_UTIL.SYSCS_BACKUP_DATABASE procedure. However due to a too big size sometimes this fails and I am wondering if there exist the possibility to backup only data that is newer than x days, where x is a settable values.
Would appreciate greatly any hints towards achieving this goal, if that is possible.
Related
I have SQL 2012 tabular and multidimensional models which is currently been processed through SQL jobs. All the models are processing with 'Process Full' Option everyday. However some of the models are taking long time for processing. Can anyone teel which is the best processing option that will not affect the performance of the SQL instance.
Without taking a look at you DB is hard to know but maybe I can give you a couple of hints:
It depends on how the data in DB is being updated. If all the fact table data is deleted and inserted every night, Process Full is probably the best way to go. But maybe you can partition data by date and proccess only the affected partitions.
On a multidimensional model you should check if aggregations are taking to much time. If so, you should consider redesign them.
On tabular models I found that some times having unnecesary big varchar fields can take huge amounts of time and memory to proccess. I found Kasper de Jonge's Server memory Analysis very helpful on identifying this kind of problems:
http://www.powerpivotblog.nl/what-is-using-all-that-memory-on-my-analysis-server-instance/bismservermemoryreport/
i have recently discovered MonetDB and i am evaluating it for an internal project, so probably my questions are from a really newbie point of view. Maybe someone could point me to a site and/or document where i could find more info (i haven't found too much googling)
regarding scalability, correct me please if i am wrong, but what i understand is that if i need to scale, i would launch more server instances and discover them from the control node, is it right?
is there any limit on the number of servers?
the other point is about storage, is it possible to use amazon S3 to back MonetDB readonly instances?
update we would need to store a massive amount of Call Detail Records from different sources, on a read-only basis. We would aggregate/reduce that data for the day-to-day operation, accessing the bigger tables only when the full detail is required.
We would store the historical data as well to perform longer-term analysis. My concern is mostly about memory, disk storage wouldn't be the issue i think; if the hot dataset involved in a report/analysis eats up the whole memory space (fast response times needed, not sure about how memory swapping would impact), i would like to know if i can scale somehow instead of reingeneering the report/analysis process (maybe i am biased by the horizontal scaling thing :-) )
thanks!
You will find advantages of monetdb easily on net so let me highlight some disadvantages
1. In monetdb deleting rows does not free up the space
Solution: copy data in other table,drop existing table, and rename the other table
2. Joins are little slower
3. We can can not give table name as dynamic variable
Eg: if you have table name stored in one main table then you can't make a query like "for each (select tablename from mytable) select data from tablename)" the sql
You can't make functions with tablename as variable argument.
But it is still damn fast and can store large amount of data.
The task is to filter and analyze a huge amount of logfiles (around 8TB) from a finished research project. The idea is to fill a database with the data to be able to run different analysis tasks later.
The values are stored comma separated. In principle the values are tuples of up to 5 values:
id, timestamp, type, v1, v2, v3, v4, v5
In a first try using MySQL I used one table with one log entry per row. So there is no direct relation between the log values. The downside here is slow querying of subsets.
Because there is no relation I looked into alternatives like NoSQL databases, and column based tables like hbase or cassandra seemed to be a perfect fit for this kind of data. But these systems are made for huge distributed systems, which we not have. In our case the analysis will run on a single machine or perhaps some VMs.
Which kind of database would fit this task? Is it worth to setup a single machine instance with hadoop+hbase... or is this all a bit over-sized?
What database would you choose to do high-performance logfile analysis?
EDIT: Maybe out of my question it is not clear that we cannot spend money for cloud services or new hardware. The Question is if there are benefits in using noSQL approaches instead of mySQL (especially for this data). If there are none, or if they are so small that the effort of setting up a noSQL system is not worth the benefit we can use our ESXi infrastructure and MySQL.
EDIT2: I'm still having the Problem here. I did further experiments with MySQL and just inserted a quarter of all available data. The insert is now running for over 2 days and is not yet finished. Currently there are 2,147,483,647 rows in my single table db. With indeces this takes 211,2 GiB of disk space. And this is just a quarter of all logging data...
A query of the form
SELECT * FROM `table` WHERE `timestamp`>=1342105200000 AND `timestamp`<=1342126800000 AND `logid`=123456 AND `unit`="UNIT40";
takes 761 seconds to complete, in this case returning one row.
There is a combined index on timestamp, logid, unit.
So I think this is not the way to go, because later in analysis I will have to get all entries in a time range and compare the datapoints.
I read bout MongoDB and Redis, but the problem with them is, that they are in Memory databases.
In the later analyzing process there will a very small amount of concurrent database access. In fact the analyzing will be run from one single machine.
I do not need redundancy. I would be able to regenerate the database in case of a failure.
When the database is once completely written, there would also be no need to update or add further row.
What do you think about alternatives like Redis, MongoDB and so on. When I get this right, i would need RAM in the dimension of my data...
Is this task even somehow possible with a single node system or with maybe two nodes?
well i personally would prefer the faster solution, as you said you need a high-perfomance analysis. the problem is, if you have to setup a whole new system to do so and the performance-improvement would be minor in relation to the additional effort you'd need, then stay with SQL.
in our company, we have a quite small Database containing not even half a GB of Data on the VM. the problem now is, as soon as you use a VM, you will have major performance issues, when opening the Database on VM you can go for a coffee in the meantime ;)
But if the time until the Database is loaded to cache is not so important it doesn't matter. It all depends on how much faster you think the new System will be, and how much effort you will have to put in it, but as i said i'd prefer the faster solution if you have to go for "high-performance analysis"
I'm about to create 2 new SQL Server databases for our data warehouse:
Datawarehouse - where the data is stored
Datawarehouse_Stage - where the ETL is done
I'm expecting both databases to be able 30GB and grow about 5GB per year. They probably will not get bigger than 80GB (when we'll start to archive).
I'm trying to decide what settings I should use when creating these databases:
what should the initial size be?
...and should I increase the database size straight after creating it?
what should the auto-growth settings be?
I'm after any best practice advice on creating those databases.
UPDATE: the reason I suggest increasing the database size straight after creating it, because you can't shrink a database to less than its initial size.
•what should the initial size be?
45gb? 30 + 3 years grow, especially given that this fits on a LOW END CHEAP SSD DISC ;) Sizing is not an issue if your smallest SSD is 64GB.
...and should I increase the database size straight after creating it?
That would sort of be stupid, or? I mean, why create a db with a small size jsut to resize IMMEDIEATLEY after, instead of putting the right size into the script in the first step.
what should the auto-growth settings be?
This is not a data warehouse question. NO AUTOGROW. Autogrow fragemnts your discs.
Make sure you format the discs according to best practices (64kb node size, aligned partitions).
I'm running a software called Fishbowl inventory and it is running on a firebird database (Windows server 2003) at this time the fishbowl software is running extremely slow when more then one user accesses the software. I'm thinking I maybe able to speed up the application by forcing the database to run "In Memory". However I can not find documentation on how to do this. Any help would be greatly appreciated.
Thank you in advance.
Robert
Firebird does not have memory tables - they may or may not be added in future versions (>3) but certainly not in the upcoming 2.5. There can be any other number of reasons why your software is slow with multiple users; however, Firebird itself has pretty good concurrency, so make sure you find the actual bottleneck first.
+1 to Holger. Find the bottleneck first.
Sinática Monitor may help you.
In-memory tables are nice either for OLAP (when data is not changing) or for temporary internal data storage.
In both cases data loss is not danger.
Pity that FB has no in-memory mode. I think about using SQLite as result.
As for caching, i think simple parallel thread that reads all the blocks of database file would make it in-memory - in OS cache if OS has enough memory.
But i also think, that OS already cached as much of DB file as it could and agressive forcing to cache would make overall performance even worse.
I had read an article some time ago, from someone who did a memory drive (like in old DOS) and ran a Database there. The problem is if anything fails, you lose everything. You should do backups very often to ensure a minimum of security.
Not a good idea at all I think.