RediSql (for redis): Get column names as well as data type? - sql

I am using the excellent RediSql, a module for Redis, to get a powerful caching solution.
When sending a command to Redis, that interacts with the SqLite db in the background, like this:
REDISQL.EXEC db "SELECT * FROM jobcache"
I get a result like this:
I get a type for the integer column, but not for the string, and no column names are provided.
Is there a way to get column name and defined data type always? I would need this, as I need to convert the results back to a more standard sql result format.

unfortunately, at the moment this is not possible with the EXEC command.
You can use the QUERY.INTO command reference
QUERY.INTO add the result of your query into a stream, it adds the column and the values for each row. Then you can consume the stream in whichever way you prefer.
When doing query (reads) against RediSQL is a good practice to use the .QUERY family of commands, this avoids useless replication of data, in the case you are in a cluster setup.
Moreover, it is possible to use the .QUERY commands also against replica of the main redis instance, while the .EXEC commands can be used only against the primary instance.

Related

Cleaning up raw syslog events coming into Sentinel - json/string

I'm having an issue parsing out syslog data coming into Sentinel. I think it's a misunderstanding of the data types and what my options are when working with them.
I have some raw syslog coming into Sentinel. This data is being ingested with 4 columns: TimeStamp, SyslogMessage, Computer, and Facility. The 'SyslogMessage' column is the one with by far the most data in it, but I'm having issues parsing it out to make it useful. I'd like to be able to take pieces out of the "SyslogMessage" column, and extend new columns from that data, which will give a better ability to manipulate that data than some string operator like contains.
For instance, in a separate situation I had some raw event data coming through as what I think is Json. With this dataset, I was able to do something like extend c = RawEventData.AccountMoniker, which would give me a column 'c' and would only project the AccountMoniker data. Here is an example of that working dataset:
The data set that I am currently working with, looks like this picture. It looks to be formatted similarly to json, but seems to have had a string prefixed to the beginning of it which made the rest of the data a string I think. Here is that data:
I've been able to work in some regex and get the 'SyslogMessage' down to just the bracketed material, but have still been having issues when trying to do something like 'parse_json'. Right now, the only way I'm able to search through this data is using 'has' or 'contains'. What are my options for getting the 'SyslogMessage' data into a type that I can more easily search through and project as columns?

How do you get a list of streams?

I need to get a list of all streams (keys) in a database but I can't find a command for it.
I've already tried going over all keys and checking their typebut it is too slow/expensive.
I'd like to do something like XSCAN and get a list of keys like: ["stream1", "stream2"]
As of version 6.0 you can use the TYPE option to ask SCAN to only return objects that match a given type.
SCAN 0 TYPE stream
https://redis.io/commands/scan
There's no such command. Same as there's no way to get a list of other data structures, e.g. LIST, SET.
Instead, you can create an extra SET to record the keys of the streams you created. So that you can scan the SET to get the list of streams.
If you can have a prefix in the stream names ex: 'MyStream:1', 'MyStream:2'
Then you can use regular scan command with patterns matching MyStream:*
EDIT:
To address OPs concern to not have to use prefix and use SCAN command as is, adding from comments :
You can avoid using a prefix by using namespacing capability provided by redis. You can assign a 'database' (0-15 by default) for streams names. Say you use database 5 for streams, then scan command in database 5 should return the keys in it only. redis.io/commands/select

Azure Stream Analytics -> how much control over path prefix do I really have?

I'd like to set the prefix based on some of the data coming from event hub.
My data is something like:
{"id":"1234",...}
I'd like to write a blob prefix that is something like:
foo/{id}/guid....
Ultimately I'd like to have one blob for each id. This will help how it gets consumed downstream by a couple of things.
What I don't see is a way to create prefixes that aren't related to date and time. In theory I can write another job to pull from blobs and break it up after the stream analytics step. However, it feels like SA should allow me to break it up immediately.
Any ideas?
{date} , {time} and {partition} are the only ones supported in blob output prefix. {partition} is a number.
Using a column value in blob prefix is currently not supported.
If you have a limited number of such {id}s then you could workaround by writing multiple "select --" statements with different filters writing to different outputs and hardcode the prefix in the output. Otherwise it is not possible with just ASA.
It should be noted that now you actually can do this. Not sure when it was implemented but you can now use a single property from your message as a custom partition key and the syntax is exactly as the OP has asked for: foo/{id}/something/else
More details are documented here: https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-custom-path-patterns-blob-storage-output
Key points:
Only one custom property allowed
Must be a direct reference to an existing message property (i.e. no concatenations like {prop1+prop2})
If the custom property results in too many partitions (more than 8,000) then an arbitrary number of blobs may be created for the same parition

Dynamically execute a transformation against a column at runtime

I have a Pentaho Kettle job that can load data from x number of tables, and put it into target tables with a different schema.
Assume I have table 1, like so:
I want to load this table into a destination table that looks like this:
The columns have been renamed, the order has been changed, and the data has been transformed. The rename, and order is easily managed by using the Select Values step, which can be used within an ETL Metadata Injection step, making it dependent on some configuration values loaded at runtime.
But if I need to perform some transformation logic on some of the columns, based on where they go in the target table, this seems to be less straightforward.
In my example, I want the column "CountryName" to be capitalised, and the column "Rating" to be floored (as in changing the real number to the previous integer value).
While I could do this by just manually adding a transformation to accomplish each, I want my solution to be dynamic, so it could just as easily run the "CountryName" column through a checksum component, or perform a ceiling on "Rating" instead.
I can easily wrap these transformations in another transformation so that they can be parameterised and executed when needed:
But, where I'm having trouble is, when I process a row of data, I need a way to be able to say:
Column "CountryName" should be passed through the Capitalisation transform
Column "Rating" should be passed through the Floor transform
Column(s) "AnythingElse" should be passed through the SomeOther transform
Is there a way to dynamically split out the columns in a row, and execute a different transform on each one, based on some configuration metadata that can be supplied?
Logically, it would be something like this, although I suspect there may be a way to handle it as a loop or some form of dynamic transformation, rather than mapping out a path per column:
Kettle is so flexible that it seems like there must be a way to do this, I'm just struggling to know which components to use and how to do it. Any experts out there have some suggestions?
I'm dealing with some biggish data sets here (hundreds of millions of rows) so reluctant to use Row Normaliser/Denormaliser or writing to file/DB if possible.
Have you considered the Modified Java Script Value step? Start with the Data Grid step, the a Select Values step, then the Modified Java Script Value step. In that step you will transform the value of each column in what you form you want and output that in a file.
That of course requires some Java script knowledge but given your example it seems that the required knowledge is pretty basic.

Can I maintain state between calls to a SQL Server UDF?

I have a SQL script that inserts data (via INSERT statements currently numbering in the thousands) One of the columns contains a unique identifier (though not an IDENTITY type, just a plain ol' int) that's actually unique across a few different tables.
I'd like to add a scalar function to my script that gets the next available ID (i.e. last used ID + 1) but I'm not sure this is possible because there doesn't seem to be a way to use a global or static variable from within a UDF, I can't use a temp table, and I can't update a permanent table from within a function.
Currently my script looks like this:
declare #v_baseID int
exec dbo.getNextID #v_baseID out --sproc to get the next available id
--Lots of these - where n is a hardcoded value
insert into tableOfStuff (someStuff, uniqueID) values ('stuff', #v_baseID + n )
exec dbo.UpdateNextID #v_baseID + lastUsedn --sproc to update the last used id
But I would like it to look like this:
--Lots of these
insert into tableOfStuff (someStuff, uniqueID) values ('stuff', getNextID() )
Hardcoding the offset is a pain in the arse, and is error prone. Packaging it up into a simple scalar function is very appealing, but I'm starting to think it can't be done that way since there doesn't seem to be a way to maintain the offset counter between calls. Is that right, or is there something I'm missing.
We're using SQL Server 2005 at the moment.
edits for clarification:
Two users hitting it won't happen. This is an upgrade script that will be run only once, and never concurrently.
The actual sproc isn't prefixed with sp_, fixed the example code.
In normal usage, we do use an id table and a sproc to get IDs as needed, I was just looking for a cleaner way to do it in this script, which essentially just dumps a bunch of data into the db.
I'm starting to think it can't be done that way since there doesn't seem to be a way to maintain the offset counter between calls. Is that right, or is there something I'm missing.
You aren't missing anything; SQL Server does not support global variables, and it doesn't support data modification within UDFs. And even if you wanted to do something as kludgy as using CONTEXT_INFO (see http://weblogs.sqlteam.com/mladenp/archive/2007/04/23/60185.aspx), you can't set that from within a UDF anyway.
Is there a way you can get around the "hardcoding" of the offset by making that a variable and looping over the iteration of it, doing the inserts within that loop?
If you have 2 users hitting it at the same time they will get the same id. Why didn't you use an id table with an identity instead, insert into that and use that as the unique (which is guaranteed) id, this will also perform much faster
sp_getNextID
never ever prefix procs with sp_, this has performance implication because the optimizer first checks the master DB to see if that proc exists there and then th local DB, also if MS decide to create a sp_getNextID in a service pack yours will never get executed
It would probably be more work than it's worth, but you can use static C#/VB variables in a SQL CLR UDF, so I think you'd be able to do what you want to do by simply incrementing this variable every time the UDF is called. The static variable would be lost whenever the appdomain unloaded, of course. So if you need continuity of your ID from one day to the next, you'd need a way, on first access of NextId, to poll all of tables that use this ID, to find the highest value.