Memory communication with DBMS - sql

Is there an option to comunicate a TVF/UDF in a DBMS with an external IDE or language like C? Doing it without writing to a table?
I know there is a way of 'memory mapping' or a way to share block of memory
POSIX mmap() function
Windows OpenFileMapping() function
I am using Windows, so Is there a way to communicate a DBMS using memory mapping or sharing with something like C?
But how would you avoid writing to a table, or a file, using just memory?

Shared Memory is available as a data transport to and from SQL providers. You don't have to write any additional code for this if you are using built-in drivers to access your provider. Instead, you would just configure the driver and the server to use this, and your application would have to reside on the same server as your SQL provider.
The ODBC drivers available for windows support shared memory for SQL activities. To write code for these from C, you would use the ODBC API to communicate with your provider. Here's a link with a function reference.
ODBC Function Summary # MSDN
Also note that there is support for BLOBs for all SQL providers that can handle arbitrary binary data. A list of the types known to ODBC API is available here. There's no strict requirement that your statement results must be expressible in tabular form.
SQL Data Types # MSDN
On the other hand, if are concerned about communicating with internal SQL entities on your own terms, you might be able to patch something together via extensions to the SQL service you are using. For example, MS SQL Server allows extensions via Ole Automation Procedures or CLR (.net) Integration (available in MS SQL Server). You could potentially use these make something to communicate out-of-band. However, neither of these is easily created with a pure C solution.
Ole Automation Procedures in SQL Server # MSDN
CLR Integration in SQL Server # MSDN
However, I recommend that you avoid doing this, as you will find that you're at the mercy of the environment of the host service and you may not be able to participate in transactions.
If your dataset size requirements are so large that you consider RAM and direct access your best option, your needs would probably be better fulfilled by communicating only the parts that change in the dataset held outside of SQL. In addition, as a shared-memory solution is restricted to one machine, you would probably want to consider splitting the work on your dataset across multiple machines. It is more likely that you would see a performance/productivity improvement by such means than by changing how you reference data in SQL.
Last, it is tough to dictate to a SQL provider that it should avoid using filesystem storage. For MS SQL Server, one possible option is to force tempdb to reside in RAM. Here's a KB article with more details. Other DBMSs may have similar configuration options.
INF: When to use tempdb in RAM
However, the use of disk storage isn't necessarily a cause for concern. I'm unable to find a good example of how SQL providers manage a RAM / Filesystem balance, but one good analogue for SQL server is how windows is affected by page-file use. Here's a great link that details how windows behaves at high limits of operation, and how memory use doesn't necessarily correspond into overflow to disk use. Also note that applications written to run on windows are also adversely affected when the host's operation approaches these limits.
Pushing the Limits of Windows: Virtual Memory # TechNet

The DBMS would have to be designed to work as you want. Regular DBMS have their own mechanisms for managing data, and while you might be able to communicate with them via shared memory, it is more likely that you won't. The DBMS might hold most of its working data in memory; it depends on the DBMS. Typically, the data will be backed by disk storage.
What you can't do is take an arbitrary DBMS and decree to it that it shall communicate with your process via shared memory. If it is designed to do so, then you can; otherwise, you can't.
Typically, though, you use an ODBC or similar driver to access the DBMS from your application, and those who implement the driver (and the DBMS) dictate how the interprocess communication will occur.

Related

If I'm using PostgreSQL, do I need a server too? Like AWS RDS?

In my CS program, I was told I should learn SQL for my databases.
If I'm using PostgreSQL, do I also need a SQL server to go along with it? Is PostgreSQL a language, a server, or both? Is there even a SQL language or is it only servers?
Background: I downloaded Postgres because hey, that has SQL in the name, it works and I'm under the impression it's a pretty good choice anyway. But I couldn't figure out through their website if it needs a companion server, so I went looking for one and found AWS RDS.
The impression I have is that Postgres is the language and AWS RDS is the server, and they serve different functions. But I'm not sure about any of that.
Seems you're learning too many new topics at the same time.
Ok. I'll try to answer.
SQL stands for 'Structured Query Language', and serves as a 'standard' for many vendors that in much ways respects its fundamentals. Oracle, MySQL (now owned by Oracle), MariaDB and PostreSQL are some vendors.
Main thing with SQL code I would recommend you to identify every time you look at it, is to understand if it belongs to DML or DDL. DML stands for 'Data Manipulation Language' and refers to SQL instructions which 'modifies' data. DDL stands for 'Data Declaration Language' which defines or 'alter' de structure on which data will be stored.
Another important concept is atomicity of data manipulation. You can confirm a change or roll it back before it is persisted. This thing corresponds to 'commit' changes or do a 'rollback'. It's some kind of advanced concept, but generally happens "automatically" with standard client configurations. Later, you would have to know about it while programming some system module which interacts with databases.
When you think of the SQL 'server', it refers to the software configured/installed which has the responsability of manage persistence of data within some kind of 'instance' of persistence, allocated in some system with data storage capabilities. AWS implements this service in the cloud, and RDS is the product which supports many kind of SQL flavors to choose (Oracle, Postgresql, etc.)
If you are comfortable with Docker, I recomend you learn the basics which would help you setup and destroy databases many times, which is useful to develop and test locally. Next command, let you start a Postgresql database configured with open port 5432. You can see the server log through docker and use some SQL client to get connected. When you press Ctrl+C everything will be deleted. Of course there are other ways to keep data persistent, but this command would be an easy starting point.
$ docker run --rm -p 5432:5432 --name some-postgres-container-name -e POSTGRES_PASSWORD=mysecretpassword postgres:13.3
Side note: it's better to get used to work with specific docker image versions always (not 'latest').
More details of it usage here: https://hub.docker.com/_/postgres/
if I'm using PostgreSQL, do I also need a SQL server to go along with
it? Is PostgreSQL a language, a server, or both? Is there even a SQL
language or is it only servers lol? I'm genuinely trying to figure
this out myself, but basically everything I read is beyond my scope of
competence and confuses me more. I'm learning the syntax of SQL well
enough, but I'm so confused about everything on the most fundamental
level.
By the way "SQL Server" is Microsoft's SQL flavor, just another one. Don't be confused with the concept of having some SQL server configured.
Yes, you can think of PostgreSQL as a language too, which shares most of its syntax and semantics with other SQL vendors. Yes, there is a 'basic' SQL language shared and compatible between all vendors; some share more aspects than others. In terms of Venn diagrams, you can think of many circles representing each one, Microsoft's SQL Server, Oracle SQL, PostgreSQL, MySQL, etc. sharing the very most of its elements, where each element is a SQL instruction.
When dealing with Databases in general, keep in mind that they helps to modelate situations of 'real world' scenarios or software systems. SQL allows to 'talk' to implementation of "Relational Databases" wich is one kind of database modeling, but there are others too. ER Diagrams helps to represent the 'structure' of a database in a conceptual manner. I like DBeaver because it has an integrated ER diagram generator wich helps to understand the structure of a given database instance.
I have used Postgres and it is an excellent product (and free).
I would install it standalone first. It does come with its own client tools, which you use to communicate with the database server, which runs independently as a service. However, you might be better off installing something like SqlWorkbench as a client tool (which I use). In the config you specify the machine Postgres is running on (which can be your local computer for testing purposes) and the port to connect on. Essentially, the client sends your instructions to Postgres server and the server returns the resultsets associated with your instructions. The client also formats the resultsets into a nice readable "spreadsheet" format with rows and columns.
First I'll try to answer the questions you asked. There is a SQL language, but in practice it is not strictly standardized. There are many offerings for databases and database servers. Many of these are discussed below.
Any database you pick will give you the chance to learn basics of SQL queries and this knowledge will serve you well even if you switch to a different database later.
Specifically, when it comes to PostgreSQL, it is a Relational Database Management System. It is a software that operates as a server. You can install it on your personal computer running Windows, Linux, or MacOS. You can also install in on a dedicated server computer where you'll get better performance and uptime. Further, there are many companies that offer PostgreSQL hosting including Amazon RDS and Google Cloud but they're not free.
For a CS student, PostreSQL installed on your personal computer might be a reasonable choice. But you have lots of options. Read on....
For a CS program, your choice of database will depend on:
what degree of portability you need
how much data you have
how many users will connect to database
what kinds of jobs you might pursue after graduation
Portability
If you think you want to ship your database with your application, then your best bet is probably SQLite. By some accounts it can handle several million rows worth of data and still be performant. However, it's not great if you need for multiple users to connect to the same database. Your data can get corrupted in many multi-user scenarios.
How Much Data and Users
For large data and large users, you'll want to consider the client/server heavy hitters:
PostgreSQL
MySQL/MariaDB
Oracle
SQL Server
These databases will support large quantities of data any many simultaneous connections. But if you want to distribute the database with your application, it's not a good idea. Or if you want to demonstrate your app, you need to ensure that a connection to a server will be available. All of these databases come with a free version, but the last two will have the most restrictions.
After Graduation
Now you're looking to the future and possibly what kind of skills you want to put on your resume. If you think you'll end up in a corporate environment that is already well established, they will likely already have a preferred database and it could be any of the ones listed here (SQLite or the "heavy hitters"). If you want to position yourself as developing apps with low overhead cost, you'll gravitate towards SQLite/PostgreSQL/MySQL. If you think you're going to be some kind of database administrator working in a buttoned-up corporate environment, those companies tend to favor SQL Server and Oracle.
Good luck. Any choice you make will probably be fine. Knowing some flavor of SQL is useful for your future endeavors.
SQL is a language like any other language but working on database. It is called SQL because it works on structured data like table (i.e rows and columns). After reading the documentation of PostgreSQL, I think we do not need any separate server installation. You can download it from here. If you are facing any issues with it I suggest using MySQL workbench. Although installation may take longer time, but its easy to understand.

Is something like PLV8 exist in Microsoft SQL Server?

Is something like PLV8 exist in Microsoft SQL Server (JavaScript procedural language add-on for Microsoft SQL Server)?
You can utilise the CLR integration in MS SQL Server, and write managed code (C# / VB.Net / possibly other languages) that you will be able to execute from within SQL Server.
Having said that, the fact that such a possibility exists doesn't necessarily imply that it should be used. Very few tasks actually benefit from being implemented in managed code compared to T-SQL, such as (the list is by no means complete):
Computationally heavy string manipulations, including regexps (the latter has no alternative in T-SQL);
Communication with objects external to SQL Server (file system, various API endpoints, etc.);
Possibility to implement autonomous transactions.
Before going this way, make sure your team understands performance and security implications related to this approach, as they are many. The aforementioned link gives you a good starting point.
Sql Server is licensed per core, and it's not cheap. As a result, this isn't something the large Sql Server customers who drive Sql Server sales numbers (and therefore feature priority) would ever ask for, because it would multiply the number of very expensive cores needed to handle the same data. Instead, this kind of thing will go into the application or service layers, which aren't licensed per core. Also, these application services are much easier to scale out to farms of multiple machines.
That said, you can use CLR and xp_cmdshell to accomplish similar tasks, and recent versions have some native JSON processing.

DBD::ODBC vs win32::odbc

I wonder what are the advantages and disadvantages using one over the other. This question originated from an advice I got here: Allocate buffer dynamically for DB query according to the record actual size
I am looking for a list of the important differences (and not an exhaustive list) that will help me to make an educated decision.
I have working experience with win32::odbc and can testify genuinely about it. It will very helpful if someone can share his/hers experience on top of the ‘dry’ documented details.
Additional info:
The author of Win32::ODBC wrote here: http://www.roth.net/perl/odbc/docs/ODBC_Docs.htm - "There are several alternatives to Win32::ODBC available such as the DataBase Interface (DBI) version called DBD::ODBC. This ODBC Perl extension is available across different platforms such as Mac and UNIX. It is a good tool to use for ODBC access into databases although it lacks some of the functionality that Win32::ODBC has."
I wonder if you know what is the functionality that it lacks.
My main reasons with going for the DBI stack are flexibility and the broader population of testers/debuggers. With DBI you are allowing yourself the option of using a driver that is specifically tuned to your particular database engine. Yes, most databases also offer an ODBC driver, but some specific capabilities may be unavailable or more troublesome through that particular API. Additionally, DBI is platform independent, making any possible future porting to another OS that much less trouble. Lastly the population of folks using DBI for their database access far exceeds those using Win32::ODBC, meaning bugs are likely to be found & patched quicker.
Looking at your other linked question I notice you are using Oracle. Using DBI you'd have the choice between using DBD::ODBC or DBD::Oracle under the hood. You can make this choice with a simple change to one of the parameters of the DBI->connect method.
If you are using Oracle's Instant Client, using DBD::Oracle can save you the trouble of downloading/installing the ODBC component on machines that will only need access via Perl. Of course removing the ODBC layer from the equation may have benefits as well.
Update:
Win32::ODBC is a relatively direct conversion of the ODBC Middleware API from C to Perl. If you are willing to limit yourself to ODBC connections on Windows, this does have the relatively minor advantage of giving you more direct control of the ODBC Middleware layer that is controlling your underlying Database. This of course does not imply that the ODBC API is particularly faithful to the API and/or capabilities of the underlying Database.
Again, presuming that you're using Oracle, you seem to have 3 choices:
Win32::ODBC -> ODBC -> Oracle's Driver for ODBC ~> Oracle Client -> Oracle Server
DBI -> DBD::Oracle ~> Oracle Client -> Oracle Server
DBI -> DBD::ODBC ~> ODBC -> Oracle's Driver for ODBC ~> Oracle Client -> Oracle Server
Where the '~>' is to the right of a layer that needed to "shim" one API to fit another.
Now I can understand if you find API fidelity to the ODBC Middleware to be desirable. Personally, I'd rather have the flexibility of DBI & the shorter software stack used by DBD::Oracle. Though I'll also guess that the longer stack involving DBD::ODBC would suit 99+% of all needs, even with two shim layers.
Another difference between DBI & Win32::ODBC is that there are many modules built around the DBI stack. The entire DBIx namespace depends on it. Search for each of these modules on metacpan.org and click on the 'reverse dependencies' link on their page and you'll get a rather sharp picture of the relative value the Perl community has assigned to each.
So if you want an additional, purely selfish, reason: a Perl developer with experience in DBI will also find themselves in much greater demand. Seriously.

When to go for stored procedures rather than embedded SQL

I am confused as when to go for stored procedures rather than embedded SQL in the code
When I googled out, I found out these points
They allow modular programming.
They can reduce network traffic.
They can be used as a security mechanism.
Is please tell me how does network traffic is related to it ??
Another main advantage for SP: you can change them (to bugfix, to extend) without changing your application code .... yet another layer of separation, which can be beneficial.
And also: security. If you use SProcs for everything, all your callers need in terms of permissions on your database is EXECUTE permissions on those SProcs - they don't need direct read/write access to your tables.
It can reduce network traffic in the sense that you send a single command to a stored proc rather than line after line of SQL statements.
Another benefit is the performance of queries themselves is better than embedded SQL due to being pre-compiled.
they can reduce network traffic by only returning the required data to the client.
Or to turn it around; a design/coding practice that can waste network traffic is to select a set of data from the DB, return it to the client and do processing there on some of the dataset. Obviously if you are working on some of the data set it would be better from a traffic perspective to not send to the client the data that is not being processed
It will reduce network traffic in the event that your database server and your server/client running the embedded-SQL are seperate.
It reduces network traffic because stored procedures are handled on the Database Server; for embedded-SQL running on a seperate machine, the database accesses must be handled over the network, thus increasing traffic.
If your embedded-SQL and database are on the same machine it will have no effect on network traffic. An example is a LAMP stack on one machine.
I would firstly question going stored procs at all...
Unlike actual programming language code, they:
not portable (every db has its own version of PL/SQL. Sometimes different version of the same database are incompatible - I've seen it)
not easily testable (not supported by industry standard unit testing frameworks)
not easily updatable/releasable (you need to drop/create them - ie modify the db to change)
do not have library support (why write code when someone else has)
are not easily integratable with other technologies (try calling a web service from them)
are typically about as primitive as Fortran and thus are inelegant and laborious to get useful coding done
do not offer debugging/tracing/message-logging etc (some dbs may support this - I haven't seen it though)
etc.
If you have a very database-specific action (eg an in-transaction action to maintain db integrity), or keep your procedures very atomic and simple, perhaps you might consider them.
Caution is advised when specifying "high performance" up front. It often leads to poor choices at the expense of good design and it will bite you much sooner than you think.
Use stored procedures at your own peril (from someone who's been there and never wants to go back). My recommendation is to avoid them like the plague.
It depends.
Are you writing an application that should be run with several databases?
What kind of data operation does your application require? Simple and thin data manipulation?
I suppose this isn't your case, because you tagged your question as 'plsql', 'SQL', 'Store procedures'.
The concept of embedded SQL in Pl/Sql is the follow:
Embedded SQL statements incorporate DDL, DML, and transaction control
statemen within a procedural language program. They are used with the
Oracle precompilers. Embedded SQL is one approach to incorporating
SQL in your procedural language applications. Another approach is to use a procedural API such as Open Database Connectivity (ODBC) or Java Database Connectivity (JDBC).
In that case there are many and important reasons.
The most important are:
The short answer could be that it's easier to write highly efficient code to access large amount of data in "Oracle database" in PL/SQL store procedures than in any others language. This because it's strictly integrated in the Oracle database.
Read the manual before: Advantages of Pl/sql stored procedures
Improved performance
Network traffic(small amount of information sent over a network). With a single call of a store procedure, a large amount of data manipulation can be done on the db server, without to go back and forth with individual sql statements and without to send over the network the data needed for the intermediate state of data manipulation itself. This concept is strongly related to applications in which intensive and highly efficient data operations/manipulation are required. It's not only a matter of subset of data to be used and sent to the client, but a matter of a quality of data in the intermediate data processing state in order to achieve the final data! If the result needed involve many sql steps and statements to be done, the advantage is evident.
No compilation is required at compilation time
More probability that the code is in Shared pool of the SGA.
Memory allocation of the code
Security with definer's rights procedures
Inherited privileges and schema context with invoker's rights procedures
Specific characteristics of the pl/sql and Oracle database, just to write a few of these:
Advantages doing independent units of Work with Autonomous Transactions`
DML, transaction management and exception handler inside the db
Calling sql function **inside SQL
Packaged cursor
Streaming table function
. Table functions with the CURSOR expression, enable you to stream data through multiple transformations, in a single SQL statement.
Deterministic function
Complex dynamic sql manipulation using dbms_SQL API in conjunction with native dynamic SQL(famous fourth method).
All modular reasons(you already mentioned):
1 Encapsulating Calculations
2 To simplify sub queries used inside outer sql
3 Combining scalar and aggregate values inside the same sql
4 Write once, using many.
Etc...
Stored procedures may be required to get the performance you need from your application code. The biggest problem with embedded SQL is that all of the business logic typically goes into the application code. This can be hugely inefficient. For example, developers will start doing client side joins: They call the database to get a set of ID values for other table records then query each of those tables one record at a time to retrieve the data they require. What can be done with one roundtrip to the database with a stored procedure may now take hundreds or thousands of roundtrips to the database with embedded sql. Each roundtrip to the database takes a lot of time not to mention that each query will have to be compiled tremendously increasing the load on the database server.
If your application is a low volume application with few users this can work. High volume applications with lots of users can quickly overload even large database servers and cause severe performance problems, even to the point where the application stops working.

how to make a db schema such that its use is supported by all db management systems

is there a windows xp utility to make a database such that its support by sql server, oracle, and other db management systems.
the database schema is very huge so i would like to know what to use to make it so its protable from sql server to oracle if future demands that change?
In short, what you seek is nearly impossible to do successfully. Every database product has enough quirks that building such database would not perform well and would be too limiting in terms of the features you were able to use. I.e, you have to play the game of lowest common denominator with respect to features that all products implement you want to support. A far better solution is to abstract the data layer into its own library accessed via interfaces so that you can swap out your data layer. ORMs, as Rafael E. Belliard suggested, makes this simpler but it can also be done manually.
I would recommend building your database using an ORM like Hibernate for Java (or NHibernate for .NET). This would allow you to seamlessly transition from one database type to the other with little to no issues. They would allow you to logically create the database schema without a specific database in mind, which you could then move from one database to the other.
I have created applications which change from SQL Server to MySQL to Oracle to MS Access to SQLite easily (clients love that flexibility).
However, you would need to know your way around programming...