What is the difference between schema and database? I have no knowledge other than running basic commands in sql such as select, update, insert and delete.
A Database is a collection of organized data, while database schema describes the structure and organization of data in a database system. The database holds the records, fields and cells of data. The database schema describes how these fields and cells are structured and organized and what types of relationships are mapped between these entities. Understandably, the schema of a database keeps constant once created, while the actual data in the database tables may change all the time.
Related
I'm dealing with a SQL Server database which contains a column "defined data" with JSON data in it (and some other simple columns). The data builds up over time, right now we have about 8 million rows.
The data from this db is periodically read by an ETL system which then reads the JSON data in the "defined data" column and maps the data to a new SQL Server table based on the columns names contained in the JSON data.
This SQL Server table is prone to changes, meaning that about every 4 months additional columns are needed or column names change. Whenever this SQL Server table changes its data structure, a new version is introduced, which also forces the JSON data structure to change.
However, the ETL system should still be able to load all historical (JSON) data from the SQL Server database, regardless of the changing version throughout time. How can I make this work, taking into consideration version changes of the SQL Server tables and the JSON data?
!example]1
So in this example my question is:
How can I ensure that I can load both client 20 and 21 into one SQL Server table without getting errors because the JSON data structure is not reflecting version 2 in the case of historical data?
Given the size of the SQL Server database, it doesn't seem like an option to update all historical JSON data according to the latest version (in this example that would mean adding "AssetType" for the 01-01-2021 data and filling it in with NULL).
Many, many thanks in advance!
First I would check if json fields exist in the table as column names by looking them up in the information schema. If not exists then alter table add column.
How can I ensure that I can load both client 20 and 21 into one SQL Server table without getting errors because the JSON data structure is not reflecting version 2 in the case of historical data?
You maintain 2 separate tables. A Raw/Staging/Bronze table that has the same schema as the source, and a Cleansed/Warehouse/Silver table that has the desired schema for reporting. If you have multiple separate sources, you may have separate Raw tables.
Periodically you enhance the schema of the Cleansed table to add new data that has appeared in the Raw table.
thanks for sparing some time by looking at my question. I am not sure if this is practical and already answered but i don't have words to search for so i will explain the problem here:
We have 2 Databases, "CRM" and "Reporting". "CRM" database is the primary database where we store all our data.
In "Reporting" db we have created 5 views that fetch data from "CRM" db tables at run-time and done some reporting on them. The separate database is because we are not allowed to created "Store Procedure/Views" in "CRM" db.
Now when we run our reports, based on Views that are in "Reporting" db they are very slow because of complex joins in the views and other computing logics. We have done the required indexing and things but that doesn't help much.
What i want to do is fetch data from "CRM" db using "Views" in the "Reporting" db and store them in some tables in "Reporting" and do a simple "select *from table" in reports. The tables will have the exact same structure as views in "Reporting" db.
Questions: I am not sure how the data updates/creates will be synced between "CRM" and "Reporting" tables provided that data will be fetch through views to keep the calculated/report ready data ?
I have a Postgres database with some schemas (all have the same structure), I want to know if there is the possibility to change the structure (Table names, new columns etc) for all the schemas in the same database. Is it possible or what's the purpose of the schemas in a database?
Thanks.
I'm going to focus on the second half of your question, because I think it'll answer the first half (and I'm not sure I understand the first half).
what's the purpose of the schemas in a database?
This confused me when I first switched from MySQL to PostgreSQL. A Postgres schema is essentially the same as a MySQL database. In fact, according to the MySQL Reference Manual:
In MySQL, physically, a schema is synonymous with a database.
That begs the question of what is a PostgreSQL database, then? From the PostgreSQL Documentation:
More accurately, a database is a collection of schemas and the schemas contain the tables, functions, etc. So the full hierarchy is: server, database, schema, table (or some other kind of object, such as a function).
So a PostgreSQL database is essentially a collection of schemas? Seems kind of pointless, why do we need that step in the hierarchy? Let's take a look at the docs for a PostgreSQL schema:
A PostgreSQL database cluster contains one or more named databases. Users and groups of users are shared across the entire cluster, but no other data is shared across databases. Any given client connection to the server can access only the data in a single database, the one specified in the connection request.
A database contains one or more named schemas, which in turn contain tables. Schemas also contain other kinds of named objects, including data types, functions, and operators. The same object name can be used in different schemas without conflict; for example, both schema1 and myschema can contain tables named mytable. Unlike databases, schemas are not rigidly separated: a user can access objects in any of the schemas in the database he is connected to, if he has privileges to do so.
So, in PostgreSQL, a schema contains tables, functions, etc. And a database manages user/group connectivity and access/roles to specific clusters of schemas. Typically, I work under one database and have information broken into schemas to segment information.
How can I sync two databases and do a manual refresh on the entities on either of the database whenever I want?
Let's say I have two databases DB1(prod) and DB2(dev). I want to update/insert only a few tables from prod DB to dev DB. How could I achieve this? Is this possible instead of DBlink since I do not have privileges to create a database link?
If you only want to do a manual refresh set up an import/export/datapump script to copy the data across if there is not too much data involved. If there is a large amount of data you could write some pl/sql as described above to only move the new/changed rows. This will be easier if your data has fields such as created/updated_on
I have a 5 database with same schema, i want to copy all data in one database with same schema
or how can i copy data from *.mdf files in database.
i am using sql server 2005
Copy Database with T-SQL:
sqlauthority
http://blog.sqlauthority.com/2009/07/29/sql-server-2008-copy-database-with-data-generate-t-sql-for-inserting-data-from-one-table-to-another-table/
Copy Database with Wizard:
kodyaz
http://www.kodyaz.com/sql-server-tools/sql-server-copy-database-wizard.aspx
I'd suggest taking a look at Red Gate SQL Data Compare. That will enable you to merge the data between the two databases and directly control which one wins in any given situation.
As mentioned above you need to deal with the Primary Keys as well...
One way to deal with this to add a "Database ID" to all the tables in the single central version. The central PKs become the PK from the source table, plus the "Database ID". This way you have unique PKs in the central version AND you can tell which database the row came from. This is what sql-hub does - there is a free licence which will let you do this as a one-off task - or you could do the inserts for each database and table in SQL.