.NET dll for Natural language to SQL/SPARQL - sql

I'm trying to build an interface for my tool to query from Semantic/Relational DB using C#.NET
I now need to have a layer above the query layer to convert NL input to SQL/SPARQL, I read through papers of NLIs, The process of making such a layer is such a load for my project besides, it's not the main target, it's an add-on.
I don't care if the dll supports Guided input only or freely input text and handles unmatchings, I just need a dll to start from and add some code on it.
The fact of whether it should support both SQL and SPARQL doesn't really matter, because I can manage to convert one to another in my project's domain (something local)
any idea on available dlls ?

You could try my Natural Language Engine for .NET. Sample project on Bitbucket and Nuget packages available.
Using TokenPhrase in your rules can match any unmatched strings in the input, or quoted strings.
In the next revision that I'll be releasing soon it also supports 'production rules' and operator precedence which make it even easier to define your grammar.
Uniquely it delivers strongly-typed .NET objects and executes your rules in a manner similar to ASP.NET MVC with controllers, dependency injection and action methods. All rules are defined in code simply by writing a method that accepts the tokens you want to match. It includes tokens for common things like numbers, distances, times, weights and temporal expressions including finite and infinite temporal expressions.
I use it in various applications to build SQL queries so it shouldn't be too hard to use it to create SPARQL queries.

Check out Kueri.me
It's not a DLL but rather a server exposing an API, so Currently it doesn't have a wrapper specifically for C#. There's an API exposed via XmlRpc that you can integrate with any language.
It converts English to SQL and gives google-style suggestions If you want to implement a search-box(supports several DB providers - like MySQL, MSSQL etc).

Related

FlatBuffer schema design for frameworks

I'm looking for advice on structuring FlatBuffer schemas for a framework which allows users to extend the data types defined by the framework, but also allows the framework developers to add new fields when new versions of the framework are published.
My original thinking was that when you create a project using this framework, it would generate several FlatBuffer schema files which you could then edit for your specific project. You could then compile the schemas and start developing code using the framework APIs.
However, this becomes a problem when the framework developers decide to add fields to the base types. As you probably know, FlatBuffers requires that any additional fields be appended to the end (or at least have a higher ID than other fields). So there is a conflict between the additions made by the framework developer and the framework user.
One possible solution would be to have a set of 'non-user-extensible' types that are owned by the framework creator, and which should not be modified by users of the framework; and these types would then be embedded within the data types defined by the framework user. However, given the restrictions on fields changing size, I am not sure if this would even work.
I'm also willing to hear alternatives to using flatbuffers if it turns out that there is no good solution otherwise.
To have open ended extension like that, you should really have the framework authors and users work in two separate tables.. where one can own the other. There is no good way to extend a single table if all contributors aren't sharing the schema in source control.
If these extensions must be in a single object for whatever reason, then Protocol Buffers is more flexible than FlatBuffers, since it doesn't require adjacent field ids. You can simply say that all field ids >=1000 are for framework users, for example.
In retrospect (answering my own question two years later), it seems that FlatBuffers was not the right choice for my use case. These days I'm using a combination of msgpack (in cases where I care about byte-size) and JSON (in cases where I don't) and I'm pretty happy with each.

REST API filter operator best practice

I am building a REST API that uses a filter parameter to control search results. E.g., one could search for a user by calling:
GET /users/?filter=name%3Dfoo
Now, my API should allow many different filter operators. Numeric operators such as equals, greater than, less than, string operators like contains, begins with or ends with and date operators such as year of or timediff. Moreover, AND and OR combinations should be possible.
Basically, I want to support a subset of the underlying MySQL database operators.
I found a lot of different implementations (two good examples are Google Analytics and LongJump) that seem to use custom syntax.
Looking at my requirements, I would probably design a custom syntax pretty similiar to the MySQL operator syntax.
However, I was wondering if there are any best practices established that I should follow and whether I should consider anything else. Thanks!
You need an already existing query language, don't try to reinvent the wheel! By REST this is complicated and not fully solved issue. There are some REST constraints your application must fulfill:
uniform interface / hypermedia as the engine of application state:
You have to send hypermedia responses to your clients, and they have to follow the hyperlinks given in those responses, instead of building the requests on their own. So you can decouple the clients from the structure of the URI.
uniform interface / self-descriptive messages:
You have to send messages annotated with semantics. So you can decouple the clients from the data structure. The best solution to do this is RDF with for example open linked data vocabs. If you don't want to use RDF, then the second best solution to use a vendor specific MIME type, so your messages will be self-descriptive, but the clients need to know how to parse your custom MIME type.
To describe simple search links, you can use URI templates, for example GET /users/{?name} will wait a name parameter in the query string. You can use the hydra:IRITemplateMapping from the hydra vocab to add semantics to the paramers like name.
Describing ad-hoc queries is a hard task. You have to describe somehow what your query can contain.
You can choose an URI query language and stick with URI templates and probably hydra annotation. There are many already existing URI query languages, like HTSQL, OData query (ppl don't like that one), etc...
You can choose an existing query language and send it in a single URI param. This can be anything you want, for example SQL, SPARQL, etc... You have to teach your client to generate that param. You can create your own vocab to describe the constraints of the actual query. If you don't need complicated things, this should not be a problem. I don't know of already existing query structure descibing vocabs, but I never looked for them...
You can choose an existing query language and send it in the body in a SEARCH request. Afaik SEARCH is not cached or supported by recent HTTP clients. It was defined by webdav. You can describe your query with the proper MIME type, and you can use the same vocab as by the previous solution.
You can use an RDF query solution, for example a SPARQL endpoint, or triple pattern fragments, etc... So your queries will contain the semantic metadata, and not your link description. By SPARQL you don't necessary need a triple data storage, you can translate the queries on server side to SQL, or whatever you use. You can probably use SPIN to describe query constraints and query templates, but that is new for me too. There might be other solutions to describe SPARQL query structures...
So to summarize if you want a real REST solution, you have to describe to your clients, how they can construct the queries and what parameters, logical operators they can use. Without query descriptions they won't be able to generate for example a HTML form for the user. If you don't want a REST solution, then pick a query language write a builder on the client, write a parser on the server and that's all.
The Open Data Protocol (OData)
You can check BreezeJs too and see how this protocol it's implemented for node.js + mongodb with breeze-mongodb module and for a .NET project using Web API and EntityFramework with Breeze.ContextProvider dll.
By embracing a set of common, accepted delimiters, equality comparison can be implemented in
straight-forward fashion. Setting the value of the filter query-string parameter to a string using those
delimiters creates a list of name/value pairs which can be parsed easily on the server-side and utilized
to enhance database queries as needed. You can use the delimeters of your choice say (“|”) to separate individual filter phrases for OR and ("&") to separate
individual filter phrases for AND and a double colon (“::”) to separate the names and values.
This provides a unique-enough set of delimiters to support the majority of use cases and creates a user readable
query-string parameter. A simple example will serve to clarify the technique. Suppose we want
to request users with the name “Todd” who live in "Denver" and have the title of “Grand Poobah”.
The request URI, complete with query-string might look like this:
GET http://www.example.com/users?filter="name::todd&city::denver&title::grand poobah”
The delimiter of the double colon (“::”) separates the property name from the comparison value,
enabling the comparison value to contain spaces—making it easier to parse the delimiter from the value
on the server.
Note that the property names in the name/value pairs match the name of the properties that would be
returned by the service in the payload.
Case sensitivity is certainly up for debate on a case-by-case basis, but in general,
filtering works best when case is ignored. You can also offer wild-cards as needed using the asterisk
(“*”) as the value portion of the name/value pair.
For queries that require more-than simple equality or wild-card comparisons, introduction of operators
is necessary. In this case, the operators themselves should be part of the value and parsed on the server
side, rather than part of the property name. When complex query-language-style functionality is
needed, consider introducing query concept from the Open Data Protocol (OData) Filter System Query
Option specification (http://www.odata.org/documentation/odata-version-4-0/)
There seems to be a lot of standards (like OData), but many are quite complicated in that they introduce new syntax.
For simple multi filtering the following format avoid polluting the parameter namespace while still standing on top of existing web-technology
GET /users?filter[name]=John&filter[title]=Manager
It's easily readable and on the backend languages like PHP will receive it as an array of filters to apply.
A possible standard would SCIM which is adopted by some commercial products. But it's not distinguished by brevity. For a pet project I used this
= equal
! not equal
* like
< smaller
> greater
& bitwise and 
| bitwise or
^ bitwise xor
~ in comma separated value list
Examples
So GET /user?name=*An* would get all users whose name start with An and GET /user?name=~Anna,Bertha would get those two users.
Not yet a standard but who knows...

Automatically generating poco classes

Today I was checking out a few technologies: T4 templating, automapper
some mini orms: petapoco, sqlfu, ormlite
I understand the gist of what these technologies provide. I'm currently working on a 3 tier system, and I would have loved to replace the DAL (data access layer located on it's own data server) and have it integrated with a mini ORM as shown. However, I will be making no such plans for now. We currently use .NET Remoting (predates WCF).
So instead of replacing whatever is on the DataServer, I'd like to extend one of these new technologies on the application server.
I've done research on how Entity Framework can automatically generate POCO classes based on the context, which is done manually after building EF, I was wondering if I can do the same without using EF.
So here's the facts on what's currently happening:
Send a sql statement (or stored proc) to the DAL to execute
Retrieves a DataSet or a DataTable back to the application through TCP channel
My question is, is it possible to automatically generate a dynamic POCO class using keywords "var" and "dynamic" based on the values sent back from the DataSet and do dynamic mapping onto it during runtime? Would any of the technologies mentioned above help? Or do I have to manually create the POCO class first, and do a mapping on it?
It seems a bit redundant for me to manually create a POCO class and map it to a backend sql table if the application could be aware of what the POCO class is supposed to have. Like what happens if I update a table on the backend, then I'd have to update the POCO class associated with it as well. I'd love to have this to be automatic for me.
If you know the data sets at compile time, then T4 might be an option. You can write a T4 script that downloads the database schema, and constructs strongly-typed entity classes and database reads/write methods.
As far late-bound (runtime) classes, one option is to use the runtime typing provided by CustomTypeDescriptor. You can pass arrays of objects back and forth from the server, and use reflection or other techniques to infer the type.
I think it should be clear that #1 is preferable, if you know the types at compile time (which it sounds like in your case here). Runtime and dynamic should only be a last resort, as it circumvents a lot of valuable compile-time type checks.
Really, I would recommend using one of the micro ORMs like Dapper, etc, if you don't want to use the full Entity Framework. That is, unless you really want to re-invent the wheel.

Using Strongly typed DataSet in VB.Net Project

Is using strongly typed dataset is good.
Currently I am working on a project developed using VB.Net in Visual Studio 2010.
Previously they were using Sql queries directly into SqlCommand of System.Data.SqlClient, but then after i shifted everything to Strongly Typed DataSet and started using TableAdapters every where..
Now i just wanna ask that is this way is good for a project...
Or Should i shift back to old ones using Just SqlCommands
Or Is there any way to make Sql DataBase in a good way because its an ERP and most of the code is for Data Access..
We use strongly typed datasets all the time now.
After shifting to this behaviour it felt really bad to have SQL-querys in code instead of having it done by the table adapter. But there is a bit overhead with datasets so I guess booth ways are good for different solutions.
Its really nice to have intellisence on all fieldnames and if you change a tableadapter so it returns something different you get design-time errors everywhere where you need to change the code to reflect the change, instead of finding out runtime when the customer is running the program.
There are so many win win-things with strongly typed datasets so I'll never go back.
Table adapters .... make a lot of mess with bigger databases, also updating the table structure also causes confusion.
I would recommend to use some auto code generators for the CRUD Operations.
To me your old pattern looks better than switching altogether to table adapters and strongly typed datasets.
If you ever want to move your data across the wire to other platforms (silverlight, web services, wcf services, etc), then using any kind of dataset will box you into a corner.
The way that we have resolved this is to have classes whose list of properties match the database exactly. To move the data in and out of the database, we use reflection to either match stored procedure parameters or generate dynamic SQL statements, depending on the circumstance and platform.
When a database table is changed, the developer making the change is also responsible for updating the class structure and vice-versa.
In order to reduce the amount of hand-coding required, we use the code generation capabilities of CodeSmith to generate classes from the database and create the basic implementations of our standard add/update stored procedures that require field enumeration.
As an added benefit, this approach removes the tight link between the database and business object structure. We are able to use our same data access code and business object classes against SQL Server, Oracle, Sqlite, and SqlServerCE databases. This code is used to create applications in Windows, PocketPC, Web, iPad, and Android apps; all of the mobile apps use local databases specific to the platform, but using the common data access code.
It is a bit more work to setup initially, but it will pay significant dividends in the long run.

Where is the api reference for nhibernate?

I may be going mental, but I can not find any api reference material for nhibernate. I've found plenty of manuals, tutorials, ebooks etc but no api reference. I saw the chm file on the nhibernate sourceforge page, but it doesn't seem to work on any of my PCs (different OSes)
Can someone please point me in the right direction?
I just found this one:
http://web.archive.org/web/20141001063046/http://elliottjorgensen.com/nhibernate-api-ref/index.html
It doesn't seem to be official, but at least it looks like an API reference... unlike the official reference, which mostly describes concepts and mappings without any information about classes and members.
If you're on Windows, get ILSpy and point it at NHibernate.dll. It's not quite the same as real API documentation, but it's not half bad.
There is no class references publicly available on Internet as far as I know. You may build it from the source. Clone them, build the NHibernate.sln solution, then go into doc folder, ensure you have prerequisites indicated in reference\readme.txt file, and run nant doc. This will generate the class reference in the build folder.
Otherwise the most commonly used API are not wide, and most of them are xml documented with intellisens working in Visual Studio. The reference documentation has the advantage of giving more context, probably helping avoiding pitfalls like believing ISession.Update is to be used for updating entities (this is wrong, you do not need it unless you use detached entities, or entities coming from another session).
Official documentation reference is on https://nhibernate.info.
Sub-links:
Global documentation list
Reference (What I mostly use, especially following sub parts.)
Configuration
Mapping - basic / entities. (Add mapping xsd definition file in any or your solution folders for letting VS know it and give you intellisens in your hbm mappings.)
Mapping - collections
Querying - general. Do not miss the named queries feature in The IQuery interface.
Querying APIs:
HQL. I mostly use HQL with named queries, in mappings, for queries not dynamically built. They get parsed and validated when building session factory, which normally occurs at application startup, so it is almost as good as compile time validation. Checks log4net logs to get detailed reasons of named query parsing failures.
Criteria API. I view it as the historical way of dynamically building queries in code, to be preferred over constructing HQL strings.
QueryOver API. Based on Criteia API, with lambda expression support for having compile time validation of queried entities namings. Should be preferred over Criteria API in my opinion.
Linq API. Great for dynamically built queries. Bear in mind that its implementation translates your queries to HQL. With complex queries, it may generate unsupported HQL constructs. Having knowledge of HQL capabilities allows a better understanding of how to write a supported Linq query for complex cases. (By example, for a complex order by, better use an explicit linq sub-query in the OrderBy rather than using a collection mapped on your queried entity.)
Native SQL. Well, quite self-explanatory. To be used by example when you need some SQL special feature not available through other querying APIs (SQL server full-text, select for xml, ...), and that you do not wish to extend those other APIs. You may also call stored procedures. When using native SQL, I favor SQL named queries.
Modifying data, from Updating objects to Flush, and Exception handling.
Performances.
Batch fetching. About this, you may read my post here for a detailed explanation of why lazy loading can be very efficient with NHibernate, thanks to batch fetching. This single feature will always cause me to prefer NHibernate over Entity Framework, till it ceases being lacking in EF.
Second level cache. Another great NHibernate feature, lacking native support in EF. Beware, you must use transactions for leveraging this. It allows NHibernate to automatically evict cached entries for you as you change data through your application process. Without transactions, NHibernate will disable the second level cache as soon as you start changing data, for avoiding letting the cache yield you stale data.
Interceptors. This is one way among many allowing to customize NHibernate inner working. NHibernate is very strong at allowing you to extend it. You may also add your own HQL extensions as here, your own linq2NH extension as here (all are answers from me). And there are other ways, see this list for linq2NH extensibility solutions.
Moreover, a class reference will very likely be near the Hibernate one. There is so many internals APIs supporting its implementation that is not much usable.
Why are such API not hidden (internal, private, ...)? Not hiding them is required for allowing the great extensibility capabilities of NHibernate. Those capabilities are a must have in my opinion. In contrast, it is so hard to fix some other .Net project shortcomings, due to lacks of extensibility they suffer. (MVC FileResult and the TweakDispositionAsInline I had to use instead of just being able of overriding some method, or try extend linq-to-entities, see this.)
there is a good book that covers a lot, and there is the html documentation on the site (which also comes as a book)
(the book would be manning - nHibernate in Action - a little outdated, but a good start)
Here is the link to the online reference