In informatica if we can generate sequence numbers using expression transformation then why do we need sequence generator transformation?
Sequence Generator transformation has certain features which are difficult if not impossible to implement using expression transformation. For example, a single Sequence Generator transformation can be connected to multiple target instances and will still generate unique sequence across the targets. Also, sequence generator can persist values without separate mapping variables, and a reusable sequence generator can be used in multiple mappings to generate unique sequences across mappings.
Related
From the snowflake documentation -
If a statement that calls RANDOM is executed more than once, there is no guarantee that RANDOM will generate the same set of values each time. This is true whether or not you specify a seed.
What's the use of a random seed if it doesn't allow to create reproducible code? Is there a way around this, so that if I want to use the same query again I'll get the same rows every time, even if the rows are ordered randomly using a seed?
For example,
SELECT ID,
ROW_NUMBER() OVER (PARTITION BY group_name ORDER BY RANDOM(123)) AS random_n
FROM my_table
WHERE random_n < 100
Repeatable random numbers are really tricky in a parallel database -- and in general, simply not worth the effort. This is even harder on cross-platform databases.
As the documentation suggests, the purpose of random(seed) is to return the same value for multiple calls within a row. This seems like a micro-efficiency, because you should be able to generate the same effect using a CTE or subquery.
The documentation also suggests using SEQ functions for certain purposes. In fact, you can generate your own repeatable pseudo-random number generator using the seq values -- assuming the underlying ordering of the data is constant. My guess is that Snowflake prefers this method for a repeatable generator.
Kind of a general question here but is there an easy way to determine, in Oracle database, if a field has a sequence number attached to it? This seems like it should be obvious, but I'm missing it.
Thanks.
In general, no. Sequences are separate first-class objects. Normally, you'd create one sequence per table and use that sequence consistently to populate the key (via a trigger or via whatever procedural API you have to do the insert). But nothing stops you from using the same sequence to populate multiple tables or writing code that doesn't use the sequence when one exists.
If you are on a recent version of Oracle and you are looking only at columns that are explicitly created as identity columns rather than the old-school approach of creating a separate sequence and using a trigger/ column default to populate the key, you can use the identity_column column in all_tab_columns (or user_tab_columns/ dba_tab_columns) to see whether the column was declared as an identity.
there is no way to attach a sequence to a field in oracle, what you can do is to use the sequence in your application as you see fit.
General you'll need to look for triggers on the table, and for procedures that maybe used to insert data to this table, some people use those to regulate sequence use and to sort of attach it to a field but it's not a real attachment but they are just using the sequence and it could be used in many other ways.
I am trying to figure out if there is a way of creating a query that is composed of dynamic logical statements (AND and OR operators) in a configurable and persistent manner.
say I want to make a set of events and bundle them under an entity called feature, so each feature is composed by events.
For example,
featureA is eventA and eventB,
featureB is (eventB and eventC) or eventD
I was suggesting:
making an S expression column, and save it under JSON column then parse it to query
creating the where clause by hand, then save it under a text column and run it later, with a view reflecting the data prettier
then, I realised I can't execute (like eval) stored strings as mentioned here.
so it comes down to what I was trying to avoid which is running and manipulating everything via client side querying. I needed a pure sql solution for further use by our data analysts.
Any suggestions?
You can execute dynamic SQL statements with https://docs.memsql.com/sql-reference/v6.7/execute-immediate/, see that page for some examples (prepared statements is a different topic, I don't think it is related to what you are looking for).
You may also be interested in https://docs.memsql.com/concepts/v6.7/persistent-computed-columns/, which allows you to define columns that are computed as sql expressions from other columns - so you could define your features this way.
Is it possible with Netezza queries to include sql files (which contain specific sql code) or it is not the right way of usage ?
Here is an example.
I have some common sql code (lets say common.sql) which creates some temp table and needs to be used across multiple other queries (lets say analysis1.sql, analysis2.sql etc.) - . From a code management perspective it is quite overwhelming to maintain if the code in common.sql is repeated across the many other queries. Is there a DRY way to do this - something like #include <common.sql> from the other queries to call the reused code common.sql ?
Including sql files is not the right way to do it. If you wish to persist with this you could use a preprocessor like cpp or even php to assemble the files for you and have a build process to generate finished ones.
However from a maintainability perspective you are better off creating views and functions for reusable content. Note that this can pose optimization barriers so large queries are often the way to go.
I agree, views, functions (table values if needed) or more likely: stored procedures is the way to go.
We have had a lot of luck letting stored procedures generate complex but repeatable code patterns on the fly based on input parameters and metadata on the tables being processed.
An example: all tables has a 'unique constraint' (which is not really unique, but that doesn't matter since it isn't enforced in Netezza) that has a fixed name of UBK_[tablename]
UBK is used as a 'signal' to the stored procedure identifying the columns of the BusinessKey for a classic 'kimball style' type 2 dimension table.
The SP can then apply the 'incoming' rows to the target table just by being supplied With the name of the target table and a 'stage' table containing all the same column names and data types.
Other examples could be a SP that takes a tablename and three arguments each with a 'string,of,columns' and does a 'excel style pivot' with group-by on the columns in the first argument and uses the second argument as to do a 'select distinct' to generate new column names for the pivoted columns, and does a 'sum' on the column in the third argument into some target table you specify the name for...
Can you follow me?
I think that the nzsql command line tool may be able to do an 'include', but a combination of strong 'building block stored procedures' and perl/python and/or an ETL tool will most likely proove a better choice
Let's say there is an application generating random GUIDs corresponding to number of normalized records in a few tables. These records with GUID "tenant_id" need to be split into multiple federated members in SQL Azure. When the SPLIT AT command is issued, what ordering mechanism is used to split members at a specific point (tenant_id)? Is it similar to ORDER BY GUID_FIELD ASC/DESC resultset? Since GUIDs are generated randomly, what is the best way to create ranges with future splits?
Thank you
GUIDs ranges are split according to their sort order in SQL Server - the same that is used for ORDER BY and indexes. See this blog post for more details on this: http://sqlblog.com/blogs/alberto_ferrari/archive/2007/08/31/how-are-guids-sorted-by-sql-server.aspx
If you are generating GUIDs randomly and you need to split, you should use the ordering definition for GUIDs to pick a point somewhere in the middle of the set of GUIDs in the member you are splitting (assuming you want to split in the middle).
If you want more control about what tenants go where, you could generate your own, "custom" GUIDs, but then you will of course lose the global uniqueness property that GUIDs have, unless you ensure globally unique generation of your "custom" GUIDs.
-- Hans Olav
Essentially, split at splits a single federation into two. It relies on the distribution key (the key you passed in the federated on clause). For example, imagine you federate on age. Origianlly you have two federations: age from 0 to 40, and age from 41 to 80. Now you split the first federation into two parts: 0 to 20 and 21 to 40. SQL Azure will automatically organize the data to make sure each federation meets the range requirement. So yes, it is kind of like order by.
Usually federation is not used on GUIDs. Instead, it's used on some key that you have more control. Using GUID is fine, but you have the risk to unbalance the federations. One federation may contain a lot of data, while the other only a little.