How to add a new column with existing and new rows to a table? - sql

I have a table that I created with a unique key and each other column representing one day of December 2014 (eg named D20141226 for data from 26/12/2014). So the table consists of 32 columns (key + 31 days). These daily columns are indicating that a customer had a transaction on that specific day or no transaction is indicated by a 0.
Now I want to execute the same query on a daily basis, producing a list of unique keys that had a transaction on that specific day. I used this easy script:
CREATE TABLE C01012015 AS
SELECT DISTINCT CALLING_ISDN AS A_PARTY
FROM CDRICC_012015
WHERE CALL_STA_TIME ::date = '2015-01-01'
Now my question is, how can I add the content of the new daily table to the existing table with the 31 days, making it effectively a table with 32 days of data (and then continue to do so on a daily basis to store up to 360 days of data)?
Please note that new customer are doing transactions every day hence there will unique keys in the daily table that aren't in the big table holding all the previous days.
It would be ideal if those new rows would automatically get a 0 instead of a NULL but I can work around it if it gets a NULL value (not sure how to make sure it gets a 0 instead).
I thought that a FULL OUTER JOIN would be the solution but that would mean that I have to list all variables in the select statement, which becomes quite large as I add one more column each day. Is there a more elegant way to do this?
Or is SQL just not suited to this and a programming language like eg R would be much better at this?

If you have the option to change your schema completely, you should unpivot your table so that your columns are something like CUSTOMER_ID INTEGER, D DATE, DID_TRANSACTION BOOLEAN. There's a post on the Enzee Community website that suggests using a user-defined table function (UDTF) to do this. If you change your schema in this way, a simple insert will work just fine and there will be no need to add columns dynamically.
If you can't change your schema that much but you're still able to add columns, you could add a column for every day of the year up front with a default value of FALSE (assuming it's a boolean column representing whether the customer had a transaction or not on that day). You probably want to script this.
ALTER TABLE table_with_daily_columns MODIFY COLUMN (D20140101 BOOLEAN DEFAULT FALSE);
ALTER TABLE table_with_daily_columns MODIFY COLUMN (D20140102 BOOLEAN DEFAULT FALSE);
-- etc
ALTER TABLE table_with_daily_columns ADD COLUMN (D20150101 BOOLEAN DEFAULT FALSE);
GROOM TABLE table_with_daily_columns;
When you alter a table like this, Netezza creates a new table and an internal view that does a UNION of the new table and the old. You need to GROOM the table to merge the tables back into a single one for improved performance.
If you really must keep one column per day, then you'll have to use the method you described to pivot the data from your daily transaction table. Set the default value for each of your columns to 0 or FALSE as described above, then:
INSERT INTO table_with_daily_columns
SELECT
cust_id,
TRUE as D20150101
FROM C01012015;

Related

Dynamic column name mapped to another table for copying the data

Source table contains data for the last 2 years, e.g.:
column name: aug_16, sep_16 ... oct_17 ... jul_18, aug_18
Column name changes every month - one column is added and one is deleted.
For example, next month column started from:
sep_16 to sep_18 which means that aug_16 will be deleted and sep_18 will be added.
So my issue is: I want to copy data to another Interface table and want to do mapping in ODI jobs before loading to Base table. How should I handle dynamic column name?
I'd say that your best option is to abandon such a data model and create a new one. Instead of doing everything in DDL, do it in DML. What I mean to say: alter table and add a DATE datatype column. It will show which month/year that row belongs to. Then, it is a simple matter of a WHERE clause which will handle data.
Instead of: update interface set sep_16 = (select sep_16 from base)
You would : insert into interface select * from base
where to_char(date_column, 'yyyymm') = '201609'
This is just an example, written for simplicity of the general idea; such a WHERE will ruin possible index on DATE_COLUMN, but that can be handled (by using BETWEEN first and last day of the month, for example)

How to add a column for each day in sql?

I'm trying to make a attendance management system for my college project.
I'm planning to createaone table for each month.
Each table will have
OCT(Roll_no int ,Name varchar, (dates...) bool)
Here dates will be from 1 to 30 and store boolean for present or absent.
Is this a good way to do it?
Is there a way to dynamically add a column for each day when the data was filled.
Also, how can I populate data according to current day.
Edit : I'm planning to make a UI which will have only two options (Present, absent) corresponding to each fetched roll no.
So, roll nos. and names are already going to be in the table. I'll just add status (present or absent) corresponding to each row in table for each date.
I would use Firebase. Make a node with a list of users. Then inside the uses make a attendance node with time-stamps for attended days. That way it's easier to parse. You also would leave room for the ability to bind data from other tables to users as well as the ability to add additional properties to each user.
Or do the SQL equivalent which would be make a table list of users (names and user properties) with associated keys (Primary keys in the user table with Foreign keys in the attendance table) that contained an attendance column that would hold an array of time-stamps representing attended days.
Either way, your UI would then only have to process timestamps and be able to parse through them with dates.
Though maybe add additional columns as years go so it wouldnt be so much of a bulk download.
Edit: In your case you'd want the SQL columns to be by month letting you select whichever month you'd like. For your UI, on injecting new attendance you'd simply add a column to the table if it does not already exist and then continue with the submission. On search/view you'd handle null results (say there were 2 months where no one attended at all. You'd catch any exceptions and continue with your display.)
Ex:
User
Primary Key - Name
1 - Joe
2 - Don
3 - Rob
Attendance
Foreign Key - Dates Array (Oct 2017)
1 - 1508198400, 1508284800, 1508371200
2 - 1508284800
3 - 1508198400, 1508371200
I'd agree with Gordon. This is not a good way to store the data. (It might be a good way to present it). If you have a table with the following columns, you will be able to store the data you want:
role_no (int)
Name (varchar)
Date (Date)
Present (bool)
If you want to then pull out the data for a particular month, you could just add this into your WHERE clause:
WHERE DATEPART(mm, [Date]) = 10 -- for October, or pass in a parameter
Dynamically adding columns is going to be a pain in the neck and is also quite messy

How to create a smart key, which would reset its auto increment value at the end of the month

I am currently working on a project for the management of oil distribution, and i need the receipts of every bill to get stored in a database. I am thinking of building a smart key for the receipts which will contain the first 2 letters of the city, the gas station id, the auto increment number, first letter of the month and the last 2 digits of the year. So it will be somewhat like this:
"AA-3-0001-J15". What i am wondering is how to make the AI number to go back at 0001 when the month changes. Any suggestions?
To answer the direct question - how to make the number restart at 1 at the beginning of the month.
Since it is not a simple IDENTITY column, you'll have to implement this functionality yourself.
To generate such complex value you'll have to write a user-defined function or a stored procedure. Each time you need a new value of your key to insert a new row in the table you'll call this function or execute this stored procedure.
Inside the function/stored procedure you have to make sure that it works correctly when two different sessions are trying to insert the row at the same time. One possible way to do it is to use sp_getapplock.
You didn't clarify whether the "auto increment" number is the single sequence across all cities and gas stations, or whether each city and gas station has its own sequence of numbers. Let's assume that we want to have a single sequence of numbers for all cities and gas stations within the same month. When month changes, the sequence restarts.
The procedure should be able to answer the following question when you run it: Is the row that I'm trying to insert the first row of the current month? If the generated value is the first for the current month, then the counter should be reset to 1.
One method to answer this question is to have a helper table, which would have one row for each month. One column - date, second column - last number of the sequence. Once you have such helper table your stored procedure would check: what is the current month? what is the last number generated for this month? If such number exists in the helper table, increment it in the helper table and use it to compose the key. If such number doesn't exist in the helper table, insert 1 into it and use it to compose the key.
Finally, I would not recommend to make this composite value as a primary key of the table. It is very unlikely that user requirement says "make the primary key of your table like this". It is up to you how you handle it internally, as long as accountant can see this magic set of letters and numbers next to the transaction in his report and user interface. Accountant doesn't know what a "primary key" is, but you do. And you know how to join few tables of cities, gas stations, etc. together to get the information you need from a normalized database.
Oh, by the way, sooner or later you will have more than 9999 transactions per month.
Do you want to store all that in one column? That sounds to me like a composite key over four columns...
Which could look like the following:
CREATE TABLE receipts (
CityCode VARCHAR2(2),
GasStationId NUMERIC,
AutoKey NUMERIC,
MonthCode VARCHAR2(2),
PRIMARY KEY (CityCode, GasStationId, AutoKey, MonthCode)
);
Which DBMS are you using? (MySQL, MSSQL, PostgreSQL, ...?)
If it's MySQL you could have a batch-job which runs on the month's first which executes:
ALTER TABLE tablename AUTO_INCREMENT = 1
But that logic would be on application layer instead of DB-layer...
In such cases, it is best to use a User-Defined function to generate this key and then store it. Like :
Create Function MyKeyGenerator(
#city varchar(250) = '',
#gas_station_id varchar(250) = '')
AS
/*Do stuff here
*/
My guess is , you may need another little table that keeps the last generated auto-number for the month and you may need to update it for the first record that generates during the month. For the next records, during the month, you will fetch from there and increment by 1. You can alse use a stored procedure that returns an Integer as a return code, just for the autonumber part and then do the rest in a function.
Btw, you may want to note that, using the first letter of the month has pitfalls, because two months can have the same first letter. May be try the the two-digit-numeric for the month or the first three letters of the month name.
If you ready not to insist the the AI number exactly be of identity type, you can have another table, where it is a non-identity regular integer, and then run an SQL Server Agent Task calling a stored procedure that'll do the incrementing business.

Sql Server 2008 partition table based on insert date

My question is about table partitioning in SQL Server 2008.
I have a program that loads data into a table every 10 mins or so. Approx 40 million rows per day.
The data is bcp'ed into the table and needs to be able to be loaded very quickly.
I would like to partition this table based on the date the data is inserted into the table. Each partition would contain the data loaded in one particular day.
The table should hold the last 50 days of data, so every night I need to drop any partitions older than 50 days.
I would like to have a process that aggregates data loaded into the current partition every hour into some aggregation tables. The summary will only ever run on the latest partition (since all other partitions will already be summarised) so it is important it is partitioned on insert_date.
Generally when querying the data, the insert date is specified (or multiple insert dates). The detailed data is queried by drilling down from the summarised data and as this is summarised based on insert date, the insert date is always specified when querying the detailed data in the partitioned table.
Can I create a default column in the table "Insert_date" that gets a value of Getdate() and then partition on this somehow?
OR
I can create a column in the table "insert_date" and put a hard coded value of today's date.
What would the partition function look like?
Would seperate tables and a partitioned view be better suited?
I have tried both, and even though I think partition tables are cooler. But after trying to teach how to maintain the code afterwards it just wasten't justified. In that scenario we used a hard coded field date field that was in the insert statement.
Now I use different tables ( 31 days / 31 tables ) + aggrigation table and there is an ugly union all query that joins togeather the monthly data.
Advantage. Super timple sql, and simple c# code for bcp and nobody has complained about complexity.
But if you have the infrastructure and a gaggle of .net / sql gurus I would choose the partitioning strategy.

Performing mass DELETE operation in Oracle 11g

I have a table MyTable with multiple int columns with date and one column containing a date. The date column has an index created like follows
CREATE INDEX some_index_name ON MyTable(my_date_column)
because the table will often be queried for its contents between a user-specified date range. The table has no foreign keys pointing to it, nor have any other indexes other than the primary key which is an auto-incrementing index filled by a sequence/trigger.
Now, the issue I have is that the data on this table is often replaced for a given time period because it was out of date. So they way it is updated is by deleting all the entries within a given time period and inserting the new ones. The delete is performed using
DELETE FROM MyTable
WHERE my_date_column >= initialDate
AND my_date_column < endDate
However, because the number of rows deleted is massive (from 5 million to 12 million rows) the program pretty much blocks during the delete.
Is there something I can disable to make the operation faster? Or maybe specify an option in the index to make it faster? I read something about redo space having to do with this but I don't know how to disable it during an operation.
EDIT: The process runs every day and it deletes the last 5 days of data, then it brings the data for those 5 days (which may have changed in the external source) and reinserts the data.
The amount of data deleted is a tiny fraction compared to the whole amount of data in the table ( < 1%). So copying the data I want to keep into another table and dropping-recreating the table may not be the best solution.
I can only think of two ways to speed up this.
if you do this on a regular basis, you should consider partitioning your table by month. Then you just drop the partition of the month you want to delete. That is basically as fast as dropping a table. Partitioning requires an enterprise license if I'm not mistaken
create a new table with the data you want to keep (using create table new_table as select ...), drop the old table and rename the interims table. This will be much faster, but has the drawback that you need to re-create all indexes and (primary, foreign key) constraints on the new table.