Extract labels from serialized array using SQL - sql

I do not have control of how this data is stored (I know as normalized data would be better for sql), because it is saved via the WordPress GravityForms plugin. The plugin uses a serialized array to define the question id (field_id), question label (label). My goal is to extract these three values in the following format:
field_id label
1 1. I know my organization’s mission (what it is trying to accomplish).
2 2. I know my organization’s vision (where it is trying to go in the future).
Here is the serialized array.
Can anyone please provide a specific example as to how to parse these values out with sql?

A specific example, no. This kind of stuff is complex. If your are working with straight json-formatted data, here are several options, none of which are simple.
You can build your own parser. Yuck.
You can upgrade everything you have to just-released SQL 2016, and hope that the built-in json tools do what you need (I've heard iffy things about them, but don't know what their final form is like. Too, updating all your database servers right now, oh sure.)
Phil Factor over on SimpleTalk built a json T-SQL parser (https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/). It looks horrible and may run poorly, but it would do the needful.
Buried in the comments of that article are links to a CLR tool that John Galt built (at https://github.com/jgcoding/J-SQL). I have used this successfully, though I haven't done anything too complex. (If you're json is relatively simple, this could do the trick.)
There are other json parsers for SQL out there, some free, some for sale. The key thing would be to not try and write your own, but rather find and use someone else's solution that addresses your requirements.

Related

Databricks: Best practice for creating queries for data transformation?

My apologies in advance for sounding like a newbie. This is really just a curiosity question I have as an outsider observing my team clash with our client. Please ask any questions you have, and I will try my best to answer it.
Currently, we are storing our transformation queries in a DynamoDB table. When needed, we pull into Databricks and execute the query. Simple as that. Our client has called this out as “hard coding” (more on that soon)
Our client has come up with an alternative that involves creating JSON config files containing the transformation rules (all tables/attributes required, target table names, Alias names, join keys, etc. etc.). From here, the SQL query is dynamically created. This approach is still “hard coding” since these config files would need to be manually edited anytime there is a change in the rules.
The way I see this: I think storing the transform rules in JSON is more business user friendly, but that’s about where I see the pros end. It brings in much more complexity to the code and likely will need to be continuously developed to support new queries. Also, I don’t see anyway to prevent “hard coding”. The client business leads seem to think there is some magical tool to convert plain English text to complex SQL queries
I just wanted to get some experts thoughts on this. Which solution is better, or is there another approach that should be taken?

MongoDB or SQL for text file?

I have a 25GB's text file with that structure(headers):
Sample Name Allele1 Allele2 Code metaInfo...
So it just one table with a few millions of records. I need to put it to database coz sometimes I need to search that file looking, for example, specific sample. Then I need to get all row and equals to file. This would be a basic application. What is important? File is constant. It no needed put function coz all samples are finished.
My question is:
Which DB will be better in this case and why? Should I put a file in SQL base or maybe MongoDB would be a better idea. I need to learn one of them and I want to pick the best way. Could someone give advice, coz I didn't find in the internet anything particular.
Your question is a bit broad, but assuming your 25GB text file in fact has a regular structure, with each line having the same number (and data type) of columns, then you might want to host this data in a SQL relational database. The reason for choosing SQL over a NoSQL solution is that the former tool is well suited for working with data having a well defined structure. In addition, if you ever need to relate your 25GB table to other tables, SQL has a bunch of tools at its disposal to make that fast, such as indices.
Both MySQL and MongoDB are equally good for your use-case, as you only want read-only operations on a single collection/table.
For comparison refer to MySQL vs MongoDB 1000 reads
But I will suggest going for MongoDB because of its aggeration pipeline. Though your current use case is very much straight forward, in future you may need to go for complex operations. In that case, MongoDB's aggregation pipeline will come very handy.

Vb.net - array, database or sql

I'm making my project for a-level computing and I reached a problem that im not sure how to solve. Im making something similar to a student information management system and I want a way to store a small list of students maybe 5 to 10 preferably in my program and make reference to them from all forms and so that if something is changed on one of the students maybe a piece of information, that these information is carried to the next form. My teacher has very little knowledge on programming and so im kind of stuck, i have no previous experience with databases or sql however if someone is willing to break it down ill be very grateful and ive got a good understanding of arrays. My deadline is the 10th of may so asap please, thanks
- kyle
If you prefer simplicity I would stick to a CSV file or XML. I found a website which an end-to-end tutorial on how to create the XML and add items (students) to it using VB.net as according to your tag. You need only after to read them from the file, but might just as well add new:
http://vb.net-informations.com/xml/how-to-xml-in-vb.net.htm
I would stick with XML (or CSV as preffered) as it is a text file basically so you can see and make changes directly if needed.

Using Lucene QueryAPI to access SQL

Can you advise on whether I can use just the Query functionality from Lucene to generate SQL queries? Something like an SQLQueryBuilder?
I have a massive SQL database of logs from a webserver cluster containing the original request and response strings plus some other useful/less bits and bobs. What I need to do is analyse the parameters in the original request and compare with the generated responses, looking at ratios, volatility, variability, consistency etc.
This question does not relate to the analysis stage, but only the retrieval of data from database which matches the parameters I'm interested in. So, I could just do this in good old sql queries, manually building the exact queries I need on a case-by-case basis. But that's kinda lame; I reckon we can be a bit smarter than that. Particularly as I can already see large numbers of similar but subtly different queries being useful. And as I'm hoping that I can expose a single search box via a web interface to non-technical end-users, adding sql queries seems like a bad idea... and a recipe for permanent maintenance requests (and can I be the first to say, er no thanks!).
In an ideal world I expose a search form, with the option to write simple queries like
request:"someAttribute=\"someValue\"" AND response="some hoped for result" AND daterange:30
which would then hopefully find all instances of requests which contain someAttribute="someValue" over the last 30 days. The results will then be put through standard statistical analyses on the given response text and printed out on-screen. At least, that's the idea.
Much of the actual logic to determine how to handle custom field definitions or special words I'll need to write myself, and that's ok. And NB, my non-technical end users are familiar enough with xml that they can handle a bit of attr="value" syntax, at least for the first iteration of the tool :D
In summary, I want to:
1) allow users to use google-like search syntax (e.g. via Lucene's QueryAPI) to specify text to match in the logs
2) allow a layer to manipulate the query based on special words or fields (e.g. this layer could be during a Java object phase)
3) convert the final query into an sql query appropriate for my database schema
4) query the database and spit back the resultset for statistical analysis
5) pretty-print on website:)
Am I completely barking up the wrong tree? It looks like it should be possible, but I can't seem to find much on it. I've been googling for a bit on this, for example trying "Lucene SQLQueryBuilder" as a possible start but didn't really find much by way of a lead.
So, my questions are:
Has anyone tried using Lucene's QueryAPI like this before? Did it work? Any gotchas?
Are there better query api libraries out there?
Examples, finished discussions and open-source implementations would be most helpful.
Many thanks.
NB: I don't think I want Lucene's search capabilities as such, as I'm only ever looking for exact matches. I just need a query layer on top of the database.
Lucene and SQL have very little in common as they're using totally different syntax (as HefferWolf mentioned) and different underlying data models. As you said yourself, I'm afraid you're barking the wrong tree.
There are however attempts, such as Hibernate Search to bridge this gap. These are interesting experiments as such, but I would be very careful to use any of that code in production.
You could possibly use Full Text Search features available in some SQL databases, or reindex all data in Lucene and use it without database.
I doubt you can reuse any code from lucene for this. Lucene does an internal rewrite of such queries but into a syntax which wouldn't be of much help for SQL I think.
name: Phil AND lastname: Miller AND NOT age: 26
would be rewritten to
+name Phil +lastname: Miller -age: 26
So I think you would have to write your on transition into a SQL Query syntax.
But maybe you can use Lucene as such for this. Have a look into hibernate-search which is quite handy to easily create a lucene index of a sql table.

Wiki Database, is there one?

I was searching the net for something like a wiki database, just like wikipedia but instead stores structured content, editable by users. What I was looking for was an online database accessible by everyone where people can design the schema and data with proper versioning of both schema and data. I couldn't find any such site. I am not sure if it is my search skills or if there really is no wiki database as of now. Does anyone out there know anything like this?
I think there is a great potential for something like this. A possible example will be a website with a GUI for querying a MySQL DB where any website visitor can create DB objects and populate data.
UPDATE: I had registered the domain wikidatabase.org to get started on a tool but I didn't find enough time yet. If anyone is interested in spending some time and coding on this, please let me know at wikidatabase.org
It's not quite what you're looking for, but Semantic Mediawiki adds database-like features to MediaWiki:
http://semantic-mediawiki.org/wiki/Semantic_MediaWiki
It's still fundamentally a Wiki, but you can add semantic tags to pages ([[foo::bar]] [[baz::1000]]) and then do database-type queries across them: SELECT baz FROM pages WHERE foo=bar would be {{#ask: [[foo::bar]] | ?baz}}. There is even an embryonic SPARQL implementation for pseudo-SQL queries.
OK this question is old, but Google led me here, so for anyone else out there looking for a wiki for structured data: Take a look at Foswiki.
This might be like what you're looking for: dbpedia.org. They're working on extracting data from Wikipedia, and encoding it in a structured format using RDF, so that it can be queried using SPARQL.
Linkeddata.org has a big list of RDF data sets.
Do you mean something like http://www.freebase.com?
You should check out https://www.wikidata.org/wiki/Wikidata:Main_Page which is a bit different but still may be of interest.
Something that might come close to your requirements is Google Docs.
What's offered is document editing roughly similar to MS Word, and spreadsheets roughly similar to Excel. I'm thinking of the latter, of course.
In Google Docs, You can create spreadsheets for free; being spreadsheets, they naturally have a row-and-column structure similar to a database, and which you can define flexibly. You can also share these sheets with other people. This seems to be a by-invite-only process rather than open-to-all, but there may be other possibilities I'm not aware of, or that level of sharing might be enough for you in any case.
mindtouch should be able to do it. It's rather easy to get data in / out. (for example: it's trivial to aggregate all the IP's for servers into one table).
I pretty much use it as a DB in the wiki itself (pages have tables, key/value..inheritance, templates, etc...) but you can also interface with the API, write dekiscript, grab the XML...
I like this idea. I have heard of some sites that are trying to pull together large datasets for various things for open consumption, but none that would allow a wiki feel.
You could start with something as simple as an installation of phpMyAdmin with a known password that would allow people to log in, create a database, edit data and query from any other site on the web.
It might suffer from more accuracy problems than wikipedia though.
OpenRecord, development of which seems to have halted in 2008, seems to approach this. It is a structured wiki in which pages are views on the data. Unlike RDBMSes it is loosely typed - the system tries to make a best guess about what data you entered, but defaults to text when it cannot guess. Schemas appear to have been implied.
http://openrecord.org
An example of the typing that is given is that of a date. If you enter '2008' in a record, the system interprets this as a date. If you enter 'unknown' however, the system allows that as well.
Perhaps you might be interested in Couch DB:
Apache CouchDB is a document-oriented
database that can be queried and
indexed in a MapReduce fashion using
JavaScript. CouchDB also offers
incremental replication with
bi-directional conflict detection and
resolution.
I'm working on an Open Source PHP / Symfony / PostgreSQL app that does this.
It allows multiple projects, each project can have multiple directories, each directory has a defined field structure. Admins set all this up.
Then members of the public can suggest new records, edit or report existing ones. All this is moderated and versioned.
It's early days yet but it basically works and is already in real world use in several projects.
Future plans already in progress include tools to help keep the data up to date, better searching/querying and field types that allow translations of content between languages.
There is more at http://www.directoki.org/
I'm surprised that nobody has mentioned Wikibase yet, which is the software that powers Wikidata.