How do I update a database that's in use? - sql

I'm building a web application using ASP.NET MVC with SQL Server and my development process is going to be like
Make changes in SQL Server locally
Create LINQ-to-SQL classes as necessary
Before committing any change set that has a database, script out the database so that I can regenerate it if I ever need to
What I'm confused about is how I'm going to update the production database which will have live data in set.
For example, let's say I have a table like
People
========================================
Id | FirstName | LastName | FatherId
----------------------------------------
1 | 'Anakin' | 'Skywalker' | NULL
2 | 'Luke' | 'Skywalker' | 1
3 | 'Leah' | 'Skywalker' | 1
in production and locally and let's say I add an extra column locally
ALTER TABLE People ADD COLUMN LightsaberColor VARCHAR(16)
and update my LINQ to SQL, script it out, test it with sample data and decide that I want to add that column to production.
As part of a deployment process, how would I do that? Does there exist some tool that could read my database generation file (call it GenerateDb.sql) and figure out that it needs to update the production People table to put default values in the new column, like
People
==========================================================
Id | FirstName | LastName | FatherId | LightsaberColor
----------------------------------------------------------
1 | 'Anakin' | 'Skywalker' | NULL | NULL
2 | 'Luke' | 'Skywalker' | 1 | NULL
3 | 'Leah' | 'Skywalker' | 1 | NULL
???

You should have a staging DB that is identical to the production database.
When you add any changes to the database, you should perform these changes to the staging DB first, and you can of course compare the dev and staging DB to generate a script with the difference.
Visual Studio has a Schema Compare that generate a script with the differences between a two databases.
There are some other tools a well that does the same.
So, you can generate the script, apply it to the staging Db and if everything went fine, you can apply the script on the production DB

Actually that is right you must have a staging process whenever we commit features we use TFS from Development to Production that is called staging you can look up the history of the TFS whether the database or the solution. and if you're not using TFS in Visual Studio and MSSQL Server.
I guess that your are commiting youre features directly to your server that is your production test. or you can test that in your test server first to see the changes.
Another thing is that I guess if you use stored procedures you can use Temporary Tables if you're asking about the script.
I guess that it's your first time commiting in a live server..

Related

AWS Athena Table Data Update

I have started testing out AWS Athena, and it so far looks good. One problem I am having is about the updating of data in a table.
Here is the scenario: In order to update the data for a given date in the table, I am basically emptying out the S3 bucket that contains the CSV files, and uploading the new files to become the updated data source. However, the period of time during which the bucket is empty (i.e. when the old source is deleted and new source is being uploaded) actually is a bottleneck, because during this interval, anyone querying the table will get no result.
Is there a way around this?
Thanks.
Athena is a web service that allows to query data which resides on AWS S3. In order to run queries, Athena needs to now table schema and where to look for data on S3. All this information is stored in AWS Glue Meta Data catalog. This essentially means that each time you get a new data you simply need to upload a new csv file onto S3.
Let's assume that you get new data everyday at midnight and you store them in an S3 bucket:
my-data-bucket
├── data-file-2019-01-01.csv
├── data-file-2019-01-02.csv
└── data-file-2019-01-03.csv
and each of these files looks like:
| date | volume | product | price |
|------------|---------|---------|-------|
| 2019-01-01 | 100 | apple | 10 |
| 2019-01-01 | 200 | orange | 50 |
| 2019-01-01 | 50 | cherry | 100 |
Then after you have uploaded them to AWS S3 you can use the following DDL statement in order to define table
CREATE EXTERNAL TABLE `my_table`(
`date` timestamp,
`volume` int,
`product` string,
`price` double)
LOCATION
's3://my-s3-bucket/'
-- Additional table properties
Now when you get a new file data-file-2019-01-04.csv and you upload it to the same location as other files, Athena would be able to query new data as well.
my-data-bucket
├── data-file-2019-01-01.csv
├── data-file-2019-01-02.csv
├── data-file-2019-01-03.csv
└── data-file-2019-01-04.csv
Update 2019-09-19
If your scenario is when you need to updated data in the S3 bucket, then you can try to combine views, tables and keeping different versions of data
Let's say you have table_v1 that queries data in s3://my-data-bucket/v1/ location. You create a view for table_v1 which can be seen as a wrapper of some sort:
CREATE VIEW `my_table_view` AS
SELECT *
FROM `table_v1`
Now your users could use my_table to query data in s3://my-data-bucket/v1/ instead of table_v1. When you want to update data, you can simply upload it to s3://my-data-bucket/v2/ and define table table_v2. Next, you need to update your my_table_view view since all queries are run against it:
CREATE OR REPLACE VIEW `my_table_view` AS
SELECT *
FROM `table_v2`
After this is done, you can drop table_v1 and delete files from s3://my-data-bucket/v1/. Provided that data schema hasn't changed, all queries that ran against my_table_view view while it was based on table_v1 should still be valid and succeed after my_table_view got replaced.
I don't know what would the downtime of replacing a view, but I'd expect it to less then a second, which is definitely less that the time it takes to upload new files.
What most people want to do is probably MSCK REPAIR TABLE <table_name>.
This updates the metadata if you have added more files in the location, but it is only available if you table has partitions.
You might also want to do this with a Glue Crawler which can be scheduled to refresh the table with new data.
Relevant documentation.

Load data from csv in google cloud storage as bigquery 'in' query

I want to compose such query using bigquery, my file stored in Google cloud platform storage:
select * from my_table where id in ('gs://bucket_name/file_name.csv')
I get no results. Is it possible? or am I missing something?
You are able using the CLI or API to do adhoc queries to GCS files without creating tables, a full example is covered here Accessing external (federated) data sources with BigQuery’s data access layer
code snippet is here:
BigQuery query --external_table_definition=healthwatch::date:DATETIME,bpm:INTEGER,sleep:STRING,type:STRING#CSV=gs://healthwatch2/healthwatchdetail*.csv 'SELECT date,bpm,type FROM healthwatch WHERE type = "elevated" and bpm > 150;'
Waiting on BigQueryjob_r5770d3fba8d81732_00000162ad25a6b8_1 ... (0s)
Current status: DONE
+---------------------+-----+----------+
| date | bpm | type |
+---------------------+-----+----------+
| 2018-02-07T11:14:44 | 186 | elevated |
| 2018-02-07T11:14:49 | 184 | elevated |
+---------------------+-----+----------+
on other hand you can create a permament EXTERNAL table with autodetect schema to facilitate WebUI and persistence read more about that here Querying Cloud Storage Data

How to set DTU for Azure Sql Database via SQL when copying?

I know that you can create a new Azure SQL DB by copying an existing one by running the following SQL command in the [master] db of the destination server:
CREATE DATABASE [New_DB_Name] AS COPY OF [Azure_Server_Name].[Existing_DB_Name]
What I want to find out is if its possible to change the number of DTU's the copy will have at the time of creating the copy?
As a real life example, if we're copying a [prod] database to create a new [qa] database, the copy might only need resources to handle a small testing team hitting the QA DB, not a full production audience. Scaling down the assigned DTU's would result in a cheaper DB. At the moment we manually scale after the copy is complete but this takes just as long as the initial copy (several hours for our larger DB's) as it copies the database yet again. In an ideal world we would like to skip that step and be able to fully automate the copy process.
According to the the docs is is:
CREATE DATABASE database_name
AS COPY OF [source_server_name.] source_database_name
[ ( SERVICE_OBJECTIVE =
{ 'basic' | 'S0' | 'S1' | 'S2' | 'S3' | 'S4'| 'S6'| 'S7'| 'S9'| 'S12' |
| 'GP_GEN4_1' | 'GP_GEN4_2' | 'GP_GEN4_4' | 'GP_GEN4_8' | 'GP_GEN4_16' | 'GP_GEN4_24' |
| 'BC_GEN4_1' | 'BC_GEN4_2' | 'BC_GEN4_4' | 'BC_GEN4_8' | 'BC_GEN4_16' | 'BC_GEN4_24' |
| 'GP_GEN5_2' | 'GP_GEN5_4' | 'GP_GEN5_8' | 'GP_GEN5_16' | 'GP_GEN5_24' | 'GP_GEN5_32' | 'GP_GEN5_48' | 'GP_GEN5_80' |
| 'BC_GEN5_2' | 'BC_GEN5_4' | 'BC_GEN5_8' | 'BC_GEN5_16' | 'BC_GEN5_24' | 'BC_GEN5_32' | 'BC_GEN5_48' | 'BC_GEN5_80' |
| { ELASTIC_POOL(name = <elastic_pool_name>) } } )
]
[;]
CREATE DATABASE (sqldbls)
You can also change the DTU level during a copy from the PowerShell API
New-AzureRmSqlDatabaseCopy
But you can only choose "a different performance level within the same service tier (edition)" Copy an Azure SQL Database.
You can, however, copy the database into an elastic pool in the same service tier, so you wouldn't be allocating new DTU resources. You might have a single pool for all your dev/test/qa datatabases and drop the copy there.
If you want to change the service tier, you could a Point-in-time Restore instead of a Database Copy. The database can be restored to any service tier or performance level, using the Portal, PowerShell or REST.
Recover an Azure SQL database using automated database backups

Import data from Excel to PostgreSQL

I have seen questions on stackoverflow similar/same as the one I am asking now, however I couldn't manage to solve it in my situation.
Here is the thing:
I have an excel spreadsheet(.xlsx) whom i converted in comma seperated value(.CSV) as it is said in some answers:
My excel file looks something like this:
--------------------------------------------------
name | surname | voteNo | VoteA | VoteB | VoteC
--------------------------------------------------
john | smith | 1001 | 30 | 154 | 25
--------------------------------------------------
anothe| person | 1002 | 430 | 34 | 234
--------------------------------------------------
other | one | 1003 | 35 | 154 | 24
--------------------------------------------------
john | smith | 1004 | 123 | 234 | 53
--------------------------------------------------
john | smith | 1005 | 23 | 233 | 234
--------------------------------------------------
In PostgreSQL I created a table with name allfields and created 6 columns
1st and 2nd one as a character[] and last 4 ones as integers with the same name as shown in the excel table (name, surname, voteno, votea, voteb, votec)
Now I'm doing this:
copy allfields from 'C:\Filepath\filename.csv';
But I'm getting this error:
could not open file "C:\Filepath\filename.csv" for reading: Permission denied
SQL state: 42501
My questions are:
Should I create those columns in allfields table in PostgreSQL?
Do I have to modify anything else in Excel file?
And why I get this 'permission denied' error?
Based on your file, neither of the first two columns needs to be an array type (character[]) - unlike C-strings, the "character" type in postgres is a string already. You might want to make things easier and use varchar as the type of those two columns instead.
I don't think you do.
Check that you don't still have that file open and locked in excel - if you did a "save as" to convert from xlsx to csv from within excel then you'll likely need to close out the file in excel.
SQL state: 42501 in PostgreSQL means you don't have permission to perform such operation in the intended schema. This error code list shows that.
Check that you're pointing to the correct schema and your user has enough privileges.
Documentation also states that you need select privileges on origin table and insert privileges on the destination table.
You must have select privilege on the table whose values are read by
COPY TO, and insert privilege on the table into which values are
inserted by COPY FROM. It is sufficient to have column privileges on
the column(s) listed in the command.
Yes I think you can. For COPY command, there is optional HEADER clause. Check
http://www.postgresql.org/docs/9.2/static/sql-copy.html
I don't think so. With my #1 and #3, it should works.
You need superuser permission for that.
1) Should I create those columns in allfields table in PostgreSQL?
Use text for the character fields. Not an array in any case, as #yieldsfalsehood pointed out correctly.
2) Do I have to modify anything else in Excel file?
No.
3) And why I get this 'permission denied' error?
The file needs be accessible to your system user postgres (or what ever user you are running the postgres server with). Per documentation:
COPY with a file name instructs the PostgreSQL server to directly read
from or write to a file. The file must be accessible to the server and
the name must be specified from the viewpoint of the server.
The privileges of the database user are not the cause of the problem. However (quoting the same page):
COPY naming a file or command is only allowed to database superusers,
since it allows reading or writing any file that the server has privileges to access.
Regarding the permission problem, if you are using psql to issue the COPY command, try using \copy instead.
Ok the Problem was that i need to change the path of the Excel file. I inserted it in the public account where all users can access it.
If you face the same problem move your excel file to ex C:\\User\Public folder (this folder is a public folder without any restrictions), otherwise you have to deal with Windows permission issues.
For those who do not wish to move the files they wish to read to a different location(public) for some reason. Here is a clear solution.
Right click the folder holding the file and select properties.
Select the Security tab under properties.
Select Edit
Select Add
Under the field Enter the object Names to select, Type in Everyone
Click OK to all the dialog boxes or Apply if it is activated
Try reading the file again.

LDAP user data caching on local database

I am integrating LDAP authentication in my web enterprise application. I would like to show listing of people name and email. Instead of querying the LDAP server for the name and email each time a listing containing several users I thought about caching the data locally in the database.
Do you guys know about caching LDAP data best practices?
Should I cache LDAP user data?
When should I insert and refresh the data?
I did the same thing when developping web applications with LDAP authentication.
Each time a user logs in, I retrieve his LDAP uid and checks if it is in the database. If not, I get user information from LDAP, in your case : name, surname (?) and email. Then insert it in the user table of the database.
The user table schema should look like this :
________________
| User |
________________
| - id |
| - ldap_uid |
| - name |
| - first_name |
| - mail |