How to store a set of strings in Redshift? - sql

I need to store an array of strings as a table column in Redshift and then check whether this array contains some string.
Since Redshift doesn't support array types, I started looking for ways around it. The only thing I came up is to encode this array as a pipe-separated string, previously escaping all the pipes within the elements of the array. Looking up of the element will be done using regexps.
While this solution seems to be viable, it requires some pre- and post-processing. Maybe you can recommend some alternatives?

Related

Data Factory expression substring? Is there a function similar like right?

Please help,
How could I extract 2019-04-02 out of the following string with Azure data flow expression?
ABC_DATASET-2019-04-02T02:10:03.5249248Z.parquet
The first part of the string received as a ChildItem from a GetMetaData activity is dynamically. So in this case it is ABC_DATASET that is dynamic.
Kind regards,
D
There are several ways to approach this problem, and they are really dependent on the format of the string value. Each of these approaches uses Derived Column to either create a new column or replace the existing column's value in the Data Flow.
Static format
If the format is always the same, meaning the length of the sections is always the same, then substring is simplest:
This will parse the string like so:
Useful reminder: substring and array indexes in Data Flow are 1-based.
Dynamic format
If the format of the base string is dynamic, things get a tad trickier. For this answer, I will assume that the basic format of {variabledata}-{timestamp}.parquet is consistent, so we can use the hyphen as a base delineator.
Derived Column has support for local variables, which is really useful when solving problems like this one. Let's start by creating a local variable to convert the string into an array based on the hyphen. This will lead to some other problems later since the string includes multiple hyphens thanks to the timestamp data, but we'll deal with that later. Inside the Derived Column Expression Builder, select "Locals":
On the right side, click "New" to create a local variable. We'll name it and define it using a split expression:
Press "OK" to save the local and go back to the Derived Column. Next, create another local variable for the yyyy portion of the date:
The cool part of this is I am now referencing the local variable array that I created in the previous step. I'll follow this pattern to create a local variable for MM too:
I'll do this one more time for the dd portion, but this time I have to do a bit more to get rid of all the extraneous data at the end of the string. Substring again turns out to be a good solution:
Now that I have the components I need isolated as variables, we just reconstruct them using string interpolation in the Derived Column:
Back in our data preview, we can see the results:
Where else to go from here
If these solutions don't address your problem, then you have to get creative. Here are some other functions that may help:
regexSplit
left
right
dropLeft
dropRight

Alternative to case statments when changing a lot of numeric controls

I'm pretty new to LabVIEW, but I do have experience in other programing languages like Python and C++. The code I'm going to ask about works, but there was a lot of manual work involved when putting it together. Basically I read from a text file and change control values based on values in the text file, in this case its 40 values.
I have set it up to pull from a text file and split the string by commas. Then I loop through all the values and set the indicator to read the corresponding value. I had to create 40 separate case statements to achieve this. I'm sure there is a better way of doing this. Does anyone have any suggestions?
There could be done following improvements (additionally to suggested by sweber:
If file contains just data, without "label - value" format, then you could read it as csv (comma separated values) format, and read actually just 1st row.
Currently, you set values based on order. In this case, you could: create reference to all indicators, build them to array in proper order, in For Loop assign values to indicators via property node Value.
Overall, I support sweber that if it is some key - value data, then better to use either JSON format, or .ini file format, which support such structure.
Let's start with some optimization:
It seems your data file contains nothing more than just 40 numbers. You can wire an 1D DBL array to the default input of the string-to-array VI, and you will get just a 1D array out. No need for a 2D array.
Second, there is no need to convert the FOR index value to a string, the CASE accepts integers, too.
Now, about your question: The simplest solution is to display the values as array, just as they come from the string-to-array VI.
But I guess each value has a special meaning, and you would like to display it's name/description somehow. In this case, create a cluster with 40 values, edit their labels as you like, and make sure their order in the cluster is the same as the order of the values in the files.
Then, wire the 1D array of values to this cluster via an array-to-cluster VI.
If you plan to use the text file to store and load the values, converting the cluster data to JSON and vv. might be something for you, as it transports the labels of the cluster into the file, too. (However, changing labels is an issue, then)

Add Array to field instead of String

I have the following problem:
I want to add my field value the value of value= [0,16,33,50,67,84,101,118,135,152,169,186,203,220,237,254,271,288,305,322,338,355,372,389,406,423,440,457,474,491,508,525,542,559,576,593,610,627,644,661,677,694,711,728,745,762,779,796,813,830,847,864,881,898,915,932,949,966,983,1000,1016,1033,1050,1067,1084,1101,1118,1135,1152,1169,1186,1203,1220,1237,1254,1271,1288,1305,1322,1338,1355,1372,1389,1406,1423,1440,1457,1474,1491,1508,1525,1542,1559,1576,1593,1610,1627,1644,1661,1677]
I tried to use JSON or any other field type it return me the value as a string (with "") and as I am doing stuff, it would not work. How to work around this?
I'm not entirely sure if this answers your question, but Directus 6 saves data in only the MySQL 5 datatypes. Therefore, CSV / JSON values are saved as strings (often in the TEXT datatype). If you want to use this data in your application as an array / JSON, you will have to convert it yourself.
The Directus team is working to support more (custom) datatypes in future versions, so the API can respond with nested arrays/objects in JSON.

Use Postgres to parse stringified JSON object

I've been using Postgres to store JSON objects as strings, and now I want to utilize PG's built-in json and jsonb types to store the objects more efficiently.
Basically, I want to parse the stringified JSON and put it in a json column from within PG, without having to resort to reading all the values into Python and parsing them there.
Ideally, my migration should look like this:
UPDATE table_name SET json_column=parse_json(string_column);
I looked at Postgres's JSON functions, and there doesn't seem to be a method of doing this, even though it seems pretty trivial. For the record, my JSON objects are just one-dimensional arrays of strings.
Is there any way to do this?
There is no need for a parse_json column, just change the type of the column:
ALTER TABLE table_name
ALTER COLUMN json_column TYPE json USING json_column::json;
Note that if you plan on doing a lot of JSON operations on these values (i.e. extracting elements from objects, modifying objects etc) it's better to use jsonb. json should only be used for storing JSON data. Also, as Laurenz Albe points out, if you don't need to do any JSON operations on these values and you are not interested in the validation that postgresql can do on them (e.g. because you trust that the source always provides valid JSON), then using text is a perfectly valid option (or bytea).

How are the fields delimited in a Serialized byte stram?

I was wondering how were delimited the fields of a serialized object in a byte stream?
Is there some kind of binary flag separating them, or are the length of each field defined at the beginning (or is it using another delimitation technique I didn't think of)?
Thanks!
Serialization is process of converting nested data (object) to flat stream. There are a lot of ways to do that. Each of them have corresponding specifications. If you want to have details tell which serialization you are interested in and lets search for docs.