Splitting column text - sql

I have a fact table that gets updated daily with customer time on app info from a third-party platform that we use, and the identifying number has a bit of text appended to it. So if the customer ID number is 123, this table is getting populated with something like ABC_123. I need to pull this info for a particular cohort of customers based on their ID numbers, so was planning to create a temp table with the customer ID number and the time on app, and drop the appended bit of text. I so far have not had luck finding a way to split the text in that column using the "_" as a delimiter, and I'm hesitant to use a wildcard. Any advice?

Seems like it would be better to add a PERSISTED computed column to the table. Then you have both the original data, and the one you want and you can INDEX the PERSISTED column too.
ALTER TABLE dbo.YourTable ADD GoodID AS CONVERT(int,STUFF(BadID, 1, CHARINDEX('_',BadID),'')) PERSISTED;
db<>fiddle

Related

How to add a new column with existing and new rows to a table?

I have a table that I created with a unique key and each other column representing one day of December 2014 (eg named D20141226 for data from 26/12/2014). So the table consists of 32 columns (key + 31 days). These daily columns are indicating that a customer had a transaction on that specific day or no transaction is indicated by a 0.
Now I want to execute the same query on a daily basis, producing a list of unique keys that had a transaction on that specific day. I used this easy script:
CREATE TABLE C01012015 AS
SELECT DISTINCT CALLING_ISDN AS A_PARTY
FROM CDRICC_012015
WHERE CALL_STA_TIME ::date = '2015-01-01'
Now my question is, how can I add the content of the new daily table to the existing table with the 31 days, making it effectively a table with 32 days of data (and then continue to do so on a daily basis to store up to 360 days of data)?
Please note that new customer are doing transactions every day hence there will unique keys in the daily table that aren't in the big table holding all the previous days.
It would be ideal if those new rows would automatically get a 0 instead of a NULL but I can work around it if it gets a NULL value (not sure how to make sure it gets a 0 instead).
I thought that a FULL OUTER JOIN would be the solution but that would mean that I have to list all variables in the select statement, which becomes quite large as I add one more column each day. Is there a more elegant way to do this?
Or is SQL just not suited to this and a programming language like eg R would be much better at this?
If you have the option to change your schema completely, you should unpivot your table so that your columns are something like CUSTOMER_ID INTEGER, D DATE, DID_TRANSACTION BOOLEAN. There's a post on the Enzee Community website that suggests using a user-defined table function (UDTF) to do this. If you change your schema in this way, a simple insert will work just fine and there will be no need to add columns dynamically.
If you can't change your schema that much but you're still able to add columns, you could add a column for every day of the year up front with a default value of FALSE (assuming it's a boolean column representing whether the customer had a transaction or not on that day). You probably want to script this.
ALTER TABLE table_with_daily_columns MODIFY COLUMN (D20140101 BOOLEAN DEFAULT FALSE);
ALTER TABLE table_with_daily_columns MODIFY COLUMN (D20140102 BOOLEAN DEFAULT FALSE);
-- etc
ALTER TABLE table_with_daily_columns ADD COLUMN (D20150101 BOOLEAN DEFAULT FALSE);
GROOM TABLE table_with_daily_columns;
When you alter a table like this, Netezza creates a new table and an internal view that does a UNION of the new table and the old. You need to GROOM the table to merge the tables back into a single one for improved performance.
If you really must keep one column per day, then you'll have to use the method you described to pivot the data from your daily transaction table. Set the default value for each of your columns to 0 or FALSE as described above, then:
INSERT INTO table_with_daily_columns
SELECT
cust_id,
TRUE as D20150101
FROM C01012015;

query - select data by first inserted [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
select bottom rows in natural order
People imagine that i have this table :
persons
columns of the table are NAME and ID
and i insert this
insert into persons values ('name','id');
insert into persons values ('John','1');
insert into persons values ('Jack','3');
insert into persons values ('Alice','2');
How can i select this information order by the insertion? My query would like :
NAME ID
name id
John 1
Jack 3
Alice 2
Without indexs (autoincrements), it's possible?
I'm pretty sure its not. From my knowldege sql data order is not sequetional with respect to insertion. The only idea I have is along with each insertion have a timestamp and sort by that time stamp
This is not possible without adding a column or table containing a timestamp. You could add a timestamp column or create another table containing IDs and a timestamp and insert in to that at the same time.
You cannot have any assumptions about how the DBMS will store data and retrieve them without specifying order by clause. I.e. PostgreSQL uses MVCC and if you update any row, physically a new copy of a row will be created at the end of a table datafile. Using a plain select causes pg to use sequence scan scenario - it means that the last updated row will be returned as the last one.
I have to agree with the other answers, Without a specific field/column todo this... well its a unreliable way... While i have not actually ever had a table without an index before i think..
you will need something to index it by, You can go with many other approaches and methods... For example, you use some form of concat/join of strings and then split/separate the query results later.
--EDIT--
For what reason do you wish not to use these methods? time/autoinc
Without storing some sort of order information during insert, the database does not automatically keep track of every record ever inserted and their order (this is probably a good thing ;) ). Autoincrement cannot be avoided... even with timestamp, they can hold same value.

What is the proper way to store an array into a database table?

I have an array of 50+ elements that dictates how many hours were worked for a given week.
What is the proper way to store this information into a database table?
My initial idea was to use a delimiter, but the text is too large (280 characters) to fit.
Additionally, there seems something "wrong" with creating a table column for each element.
Ideas?
Array using delimiter (comma):
37.5,37.5,37.5,37.5,37.5,37.5,37.5,37.5,37.5,37.5, ...
The "proper" way is to store the array's contents as multiple rows in a whole other table, each with a foreign key referencing the record they belong to back in the first table. There may be other things that work for you, though.
[EDIT]: From the details you added I'm guessing your array elements consist of a number of hours worked each week and you have 50+ of them because a year has 52-ish weeks. So what I think you're looking for, is I guess that your current (main) table is called something like "employees," is that each row there should have some unique identifier for each employee record. So your new table might be called "work_weeks" and consist of something like employee_id (which matches the employee id in the current table), week_number, and hours_worked.
Seems like a 1 to many relationship. For this example, tableA is the 1 and tableBlammo is the many.
tableA => column blammoId
tableBlammo => column blammoId, column data
One row in tableA joins to multiple rows in tableBlammo via the blammoId column.
Each row in tableBlammo has one element of the array in the data column.

SQL field as sum of other fields

This is not query related, what I would like to know is if it's possible to have a field in a column being displayed as a sum of other fields. A bit like Excel does.
As an example, I have two tables:
Recipes
nrecepie integer
name varchar(255)
time integer
and the other
Instructions
nintrucion integer
nrecepie integer
time integer
So, basically as a recipe has n instructions I would like that
recipes.time = sum(intructions.time)
Is this possible to be done in create table script?? if so, how?
You can use a view:
CREATE VIEW recipes_with_time AS
SELECT nrecepie, name, SUM(Instructions.time) AS total_time
FROM Recepies
JOIN Instructions USING (nrecepie)
GROUP BY Recepies.nrecepie
If you really want to have that data in the real table, you must use a trigger.
This could be done with an INSERT/UPDATE/DELETE trigger. Every time data is changed in table Instructions, the trigger would run and update the time value in Recepies.
You can use a trigger to update the time column everytime the instructions table is changed, but a more "normal" (less redundant) way would be to compute the time column via a group by clause on a join between the instructions and recepies [sic] table.
In general, you want to avoid situations like that because you're storing derived information (there are exceptions for performance reasons). Therefore, the best solution is to create a view as suggested by AndreKR. This provides an always-correct total that is as easy to SELECT from the database as if it were in an actual, stored column.
Depends pn the database vendor... In SQL Server for example, you can create a column that calculates it's value based on the values of other columns in the same row. they are called calculated columns, and you do it like this:
Create Table MyTable
(
colA Integer,
colB Integer,
colC Intgeer,
SumABC As colA + colB + colC
)
In general just put the column name you want, the word 'as' and the formula or equation to ghenerate the value. This approach uses no aditonal storage, it calculates thevalue each time someone executes a select aganist it, so the table profile remains narrower, and you get better performance. The only downsode is you cannot put an index on a calculated column. (although there is a flag in SQL server that allows you to specify to the database that it should persist the value whenever it is created or updated... In which case it can be indexed)
In your example, however, you are accessing data from multiple rows in another table. To do this, you need a trigger as suggested by other respondants.

How are these tasks done in SQL?

I have a table, and there is no column which stores a field of when the record/row was added. How can I get the latest entry into this table? There would be two cases in this:
Loop through entire table and get the largest ID, if a numeric ID is being used as the identifier. But this would be very inefficient for a large table.
If a random string is being used as the identifier (which is probably very, very bad practise), then this would require more thinking (I personally have no idea other than my first point above).
If I have one field in each row of my table which is numeric, and I want to add it up to get a total (so row 1 has a field which is 3, row 2 has a field which is 7, I want to add all these up and return the total), how would this be done?
Thanks
1) If the id is incremental, "select max(id) as latest from mytable". If a random string was used, there should still be an incremental numeric primary key in addition. Add it. There is no reason not to have one, and databases are optimized to use such a primary key for relations.
2) "select sum(mynumfield) as total from mytable"
for the last thing use a SUM()
SELECT SUM(OrderPrice) AS OrderTotal FROM Orders
assuming they are all in the same column.
Your first question is a bit unclear, but if you want to know when a row was inserted (or updated), then the only way is to record the time when the insert/update occurs. Typically, you use a DEFAULT constraint for inserts and a trigger for updates.
If you want to know the maximum value (which may not necessarily be the last inserted row) then use MAX, as others have said:
SELECT MAX(SomeColumn) FROM dbo.SomeTable
If the column is indexed, MSSQL does not need to read the whole table to answer this query.
For the second question, just do this:
SELECT SUM(SomeColumn) FROM dbo.SomeTable
You might want to look into some SQL books and tutorials to pick up the basic syntax.