How to Represent a Single Statistic in SQL Database? - sql

I have an SQL table (SQLite database), Listing, that has a datetime field. For my program, I need to know the most recent time field my program has seen. However, I don't store all listings the program has seen into the database.
So my question is, what is the usual way to store data like "most recently seen object", which is a single record, into a database? Is there something more elegant than making another table that has one record with a datetime field?

First of all, you need to save all records into your database, then you get the max() suggested by #GordonLinoff. What's your problem in save the data? If you hope keep database size small, You can add a trigger which delete oldest rows during new INSERTs.
But I don't understand properly your example. Can you give us some part of code?

Related

multiple different record types within same file must be converted to sql table

One for the SQL data definition gurus:
I have a mainframe file that has about 35-100 different record types within it. Depending upon the type of record the layout and each column is redefined into whatever. Any column on any different record could become a different length or type. I am not really wanting to split this thing up into 35-100 different tables and relating them together. I did find out that postgres has %ROWTYPE with cursor or table based records. However in all examples the data looked the same. How can I setup a table that would handle this and what sql queries would be needed to return the data? Doesn't have to be postgres but that was the only thing I could find, that looked similar to my problem.
I would just make a table with all TEXT datatype fields at first. TEXT is variable, so it only takes up the space it needs, so it performs very well. From there, you may find it quicker to move the data into better formed tables, if certain data is better with a more specific data type.
It's easier to do it in this order, because bulk insert with COPY is very picky... so with TEXT you just worry about the number of columns and get it in there.
EDIT: I'm referring to Postgres with this answer. Not sure if you wanted another DB specific answer.

sql same column value for all rows and update it frequently

What I want to achieve:
I have a table denoting ID and Credits of each individual ID.
I want to rate each ID as, rate(ID)=Credit(ID)/sum(Credit(ID)) the sum is over all IDs
I will be updating the table quite frequently and want to keep the sum(Credit(ID)) handy by creating another column and storing this sum in the table (say sigmaID), which should always have exact same value for all rows.
Whenever I change Credit for an ID (say add 100), I can simply do the same operation on this column value (add 100)
Do I have to update sigmaID for all rows? Will it be efficient?
I would like to periodically check if sigmaID is indeed sum(Credit(ID)) for consistency , am I overdoing it? Will it inefficient?
Is there any other approach to this (I am worried about efficiency)?
Kindly provide pure SQL queries as I need to put all of this in an UPDATE trigger which will calculate this rating (and loads of other formulas with other parameters of ID). I may have access to scripting language (PHP/python) but I don't know for sure. Hence the pure SQL request.
Unfortunately my English is very very poor.
As i realized, you want to have all information about your column values, in past and present?
You want to log it ? If it so you can make journal table and log everything yo want in it by trigger.
best regards,
tato mumladze

Dynamically creating tables as a means of partitioning: OK or bad practice?

Is it reasonable for an application to create database tables dynamically as a means of partitioning?
For example, say I have a large table "widgets" with a "userID" column identifying the owner of each row. If this table tended to grow extremely large, would it make sense to instead have the application create a new table called "widgets_{username}" for each new user? Assume that the application will only ever have to query for widgets belonging to a single user at a time (i.e. no need to try and join any of these user widget tables together).
Doing this would break up the one large table into more easily-managed chunks, but this doesn't seem like an elegant solution. In my mind, the database schema should be defined when the application is written, and any runtime data is stored as rows, not as additional tables.
As a more general question, is modifying the database schema at runtime ever ok?
Edit: This question is mostly hypothetical; I had a pretty good feeling that creating tables at runtime didn't make sense. That being said, we do have a table with millions of rows in our application. SELECTs perform fine, but things like deleting all rows owned by a particular user can take a while. Basically I'm looking for some solid reasoning why just dynamically creating a table for each user doesn't make sense for when I'm asked.
NO, NO, NO!! Now repeat after me, I will not do this because it will create many headaches and problems in the future! Databases are made to handle large amounts of information. they use indexes to quickly find what you are after. think phone book how effective is the index? would it be better to have a different book for each last name?
This will not give you anything performance wise. Keep a single table, but be sure to index on UserID and you'll be able to get the data fast. however if you split the table up, it becomes impossible/really really hard to get any info that spans multiple users, like search all users for a certain widget, count of all widgets of a certain type, etc. you need to have every query be built dynamically.
If deleting rows is slow, look into that. How many rows at one time are we talking about 10, 1000, 100000? What is your clustered index on this table? Could you use a "soft delete", where you have a status column that you UPDATE to "D" to mark the row as deleted. Can you delete the rows at a later time, with less database activity. is the delete slow because it is being blocked by other activity. look into those before you break up the table.
No, that would be a bad idea. However some DBMSs (e.g. Oracle) allow a single table to be partitioned on values of a column, which would achieve the objective without creating new tables at run time. Having said that, it is not "the norm" to partition tables like this: it is only usually done in very large databases.
Using an index on userID should result nearly in the same performance.
In my opinion, changing the database schema at runtime is bad practice.
Consider, for example, security issues...
Is it reasonable for an application to create database tables
dynamically as a means of partitioning?
No. (smile)

How to retrieve only updated/new records since the last query in SQL?

I was asked to design a class for caching SQL query results. Calling the class' query method will query and cache the entire set of results at the first time; afterward, each subsequence query will retrieve only the updated portion, and will merge the result into the cache.
If the class is required to be generic, i.e. NO knowledge about the db and the tables, do you have any idea?
Is it possible, and how to retrieve only updated/new records since the last query?
MySQL (nor any other SQL-based database, as far as I know) does not store the "last update time" or "insert time" of a row anywhere, at least not anywhere you can query (parsing out the binary log is probably not something you're looking to do).
To get "new records" you will need to either
Query for all records with an autoincrement value higher than the last known max
Add a datetime/timestamp to the table that would store the insert/updated time of the record, and query based on that
You could consider having your code create its own table that would store a record every time an UPDATE or INSERT happened on another table, and then add triggers to all those other tables that would populate your table.
But that might be a bit much.
Either your cache class has to be notified of all updates, or you need some kind of expiration.
The only other option is to query the database to check, but then you kind of lose the point of the cache :)
The above is true for a "generic" cache. If you implement a specialized cache for a particular table you might be able to exploit DML patterns.

suggest a method for updating data in many tables with random data?

I've got about 25 tables that I'd like to update with random data that's picked from a subset of data. I'd like the data to be picked at random but meaningful -- like changing all the first names in a database to new first names at random. So I don't want random garbage in the fields, I'd like to pull from a temp table that's populated ahead of time.
The only way I can think of to do this is with a loop and some dynamic sql.
insert pick-from names into temp table
with id field
foreach table name in a list of
tables:
build a dynamic sql that updates all
first name fields to be a name
picked at random from the temp table based on rand() * max(id) from temp table
But anytime I think "loop" in SQL I figure I'm doing something wrong.
The database in question has a lot of denormalized tables in it, so that's why I think I'd need a loop (the first name fields are scattered across the database).
Is there a better way?
Red Gate have a product called SQL Data Generator that can generate fake names and other fake data for testing purposes. It's not free, but they have a trial so you can test it out, and it might be faster than trying to do it yourself.
(Disclaimer: I have never used this product, but I've been very happy with some of their other products.)
I wrote a stored procedure to do something like this a while back. It is not as good as the Red Gate product and only does names, but if you need something quick and dirty, you can download it from
http://www.joebooth-consulting.com/products/
The script name is GenRandNames.sql
Hope this helps
Breaking the 4th wall a bit by answering my own question.
I did try this as a sql script. What I learned is that SQL pretty much sucks at random. The script was slow and weird -- functions that referenced views that were only created for the script and couldn't be made in tempdb.
So I made a console app.
Generate your random data, easy
to do with the Random class (just
remember to only use one instance of
Random).
Figure out what columns and table
names that you'd like to update via
a script that looks at
information_schema.
Get the IDs
for all the tables that you're going
to update, if possible (and wow will
it be slow if you have a large table
that doesn't have any good PKs).
Update each table 100 rows at a time. Why 100? No idea. Could be 1000. I just picked a number. Dictionary is handy here: pick a random ID from the dict using the Random class.
Wash, rinse, repeat. I updated about 2.2 million rows in an hour this way. Maybe it could be faster, but it was doing many small updates so it didn't get in anyone's way.