Executing multiple Select * in QueryDatabaseTable in NiFi - sql

I want to execute select * from table1, select * from table2, select * from table3,...select * from table80....(Basically extract data from 80 different tables and send the data to 80 different indexes in Elasticsearch(Kibana).
Is it possible for me to give multiple select * statement in one Query Database Table and then route it to different indexes? If yes, what will be the flow like?

There are a couple approaches you can take to solve this issue.
If your tables are literally table1, table2, etc., you can simply generate 80 flowfiles, each with a unique integer value in an attribute (i.e. table_count) and use GenerateTableFetch and ExecuteSQL to create the queries using this attribute via Expression Language
If the table names are non-sequential (i.e. users, addresses, etc.), you can read from a file listing each on a line or use ListDatabaseTables to query the database for the names. You can then perform simple text processing to split the created flowfile(s) to one per table and continue as above

QueryDatabaseTable doesn't allow incoming connections so it is not possible.
But you can achieve same use case with the following flow
Flow:
1. ListDatabaseTables
2. RouteOnAttribute //*optional* filter only required tables
3. GenerateTableFetch //to generate pages of sql queries and store state
4. RemoteProcessGroup (or) Load balance connection
5. ExecuteSql //run more than one concurrent task if needed
6. further processing
7. PutElasticSearch.
In addition if you don't want to run the flow incrementally then remove GenerateTableFetch processor
Configure ExecuteSql processor select query as
select * from ${db.table.schema}.${db.table.name}
Some useful references:
GenerateTableFetch link1 link2
Incrementally run ExecuteSQL processor without using GenerateTableFetch link

Related

Use SQL Field in SSIS Variable

Is it possible to reference a SQL field in your SSIS variable?
For instance, I would like use the field from the "table" below
Select '999999' AS Physician_Profile_ID
as a dynamic variable (named "CMSPhysProID" in our example) here
I plan on concatenating multiple IDs into a In statement.
Possible by using execute sql taskIn left side pan of Execute SQL task, general tab 1.Select result set as single row2. Connection type ole db 3. Set connection and form SQL statement, As you mentioned Select '999999' AS Physician_Profile_ID 4.Go to result set in your left side pan 5. Add your variable where you want to store '999999' 6. Click ok
If you are looking to store the value within the variable to be used later, you can simply use an Execute SQL Task with a single row result set. More details in the following article:
SSIS Basics: Using the Execute SQL Task to Generate Result Sets
If you are looking to add a computed column while importing data, you must use a Derived Column Transformation within the data flow task to add a column based on another one, you can refer to the following article for more details about this component:
SSIS Derived Columns with Multiple Expressions vs Multiple Transformations
What are you trying to accomplish by concatenating the IDs into an "IN" statement? If the idea is to use the values of the IDs to limit the results, as a dynamic WHERE clause, you may have better luck just using a lookup against either a table you maintain with the desired IDs or even a static list generated in the package with a script task. (If you can use the lookup table method it will be much easier to maintain as you only have to update a table, not your source code.)
Alternatively, you may even be able to accomplish the goal with a join. Create a temp table from the profile IDs you want to keep and join to it, or, again, use it as a lookup component. Dynamically creating a where clause using IN will come in a lot slower and will be cumbersome to maintain.

How can I schedule a script in BigQuery?

At last BigQuery supports using ; in the queries, so I can write more than one query in one "block", if I seperate them with semicolon.
If I run the code manually, it works. But I cannot schedule that.
When I want to schedule, I have two choices:
(New) Web UI: I must give a destination table. If I don't do it, I could not save the scheduled query. But all my queries are updates and inserts with different "destination tables". Like these:
UPDATE project.exampledataset.a
SET date = current_date()
WHEN TRUE
;
INSERT INTO project.otherdataset.b
SELECT c,d
FROM project.otherdataset.c
So I cannot even make a scheduling in the Web UI.
Classic UI: I tried this, because the official documentary states, that I should leave the "destination table" blank, and Classic UI allows it. I can setup the scheduling, but it doesn't run, when it should. I get the error message in email "Error status: Dataset specified in the query ('') is not consistent with Destination dataset 'exampledataset'."
AIK scripting (and using semicolon) is a very new feature in BigQuery, but I hope someone can help me.
Yes, I know that I could schedule every query one by one, but I would like to resolve it with one big script.
Looks like the scheduled query was defined earlier with destination dataset defined with APPEND/TRUNCATE type transaction. While updating the same scheduled query to a DML query, GUI doesn't show the dataset field / table name to update to NULL. Hence this error is coming considering the previously set dataset and table name in the scheduled query.
Hence the fix is to delete the scheduled query and create it from scratch with DML query option. It worked for me.
Scripting is supported in scheduled query now. However, scripting query, when being scheduled, doesn't support setting a destination table for now. You still need to use DDL/DML to make change to existing table.
E.g.:
CREATE OR REPLACE TABLE destinationTable AS
SELECT *
FROM sourceTable
WHERE date >= maxDate
As of 2022, the BQ Console UI will let you create a new scheduled query without a destination dataset, but it won't let you update a prior SELECT to use DDL/DML block syntax. However, you can use the BigQuery Data Transfer API to update the destinationDatasetId field, via transferconfigs/patch. Use transferconfigs/list to get the configId for a given scheduled query.
Note that you can either use the in-browser API Explorer, if you have the appropriate credentials, or write a programmatic solution. Also seems useful for setting/updating any other fields, including renaming scheduled queries.

SQL Server - Syntax around UNION and USE functions

have a series of databases on the same server which i am wishing to query. I am using the same code to query the database and would like the results to appear in a single list.
I am using 'USE' to specify which database to query, followed by creating some temporary tables to group my data, before using a final SELECT statement to bring together all the data from the database.
I am then using UNION, followed by a second USE command for the next database and so on.
SQL Server is showing a syntax error on the word 'UNION' but does not give any assistance as to the source of the problem.
Is it possible that I am missing a character. At present I am not using ( or ) anywhere.
The USE statement just redirects your session to connect to a different database on the same instance, you don't actually need to switch from database to database in this matter (there are a few rare exceptions tho).
Use the 3 part notation to join your result sets. You can do this while being connected to any database.
SELECT
SomeColumn = T.SomeColumn
FROM
FirstDatabase.Schema.TableName AS T
UNION ALL
SELECT
SomeColumn = T.SomeColumn
FROM
SecondDatabase.Schema.YetAnotherTable AS T
The engine will automatically check for your login's users on each database and validate your permissions on the underlying tables or views.
UNION adds result sets together, you can't issue another operation (like USE) other than SELECT between UNION.
You should use the database names before the table name:
SELECT valueFromBase1
FROM `database1`.`table1`
WHERE ...
UNION
SELECT valueFromBase2
FROM `database2`.`table2`
WHERE ...

Choose different division of Exact Online when using distributed query with Invantive SQL

I have a set of SQL statements using distributed option of Invantive SQL that extract shipped goods information from Exact Online and create for each serial number shipped a ticket in Freshdesk, together with the consumer as a contact.
This works fine when connected to Exact Online and Freshdesk under one log on code. However, the end user uses a different log on code. In that case the set of SQL statements retrieves data from their test division in Exact Online instead of the correct production division.
When using no distributed option, I can change the division using:
use 123123
Where 123123 is the unique division number in the Exact Online country.
When connected both to Exact Online and Freshdesk, I get a:
itgenuse002: List of partitions could not be determined.
How can I enforce that the set of SQL statements is executed for a specific Exact Online division instead of the default one set at that moment for the log on code?
Sample SQL query that shows the problem:
create or replace table fulladdress#inmemorystorage --STAP 1.
as
select acad.id
, acad.name
, acad.phone
, acad.email
, acad.addressline1 || ' ' || acad.postcode || ' ' || acad.city fulladdress
from ExactonlineREST..Accounts#eolnl acad
where acad.status = 'C'
The use statement shown is for databases with exactly one data container. In that case, there is only one data container that can handle the question and everything runs smooth.
With a distributed query in Invantive SQL, you need to direct the use statement which data container to use. Otherwise, the first data container will try to handle it (in this case probably Freshdesk which has no concept of partitioning). That is similar to appending the data container alias to each tables as in:
select ...
from table#eolnl
join table2#freshdesk
on ...
Here eolnl and freshdesk specify where the tables should looked up.
So, in this case use:
use 123123#eolnl
The same also holds for the set statement.
From your code it seems you have multiple data containers running in your connection. From the user interface you can only set the partitions on the default data container.
However, there is an easy code solution to use. You have to know the alias of the data container you want to set the partition for. The use that alias in a use call (in this sample 123123 is the partition you want to choose):
use 123123#eolnl
Or to use all partitions available:
use all#eolnl

Select from a SQL table starting with a certain index?

I'm new to SQL (using postgreSQL) and I've written a java program that selects from a large table and performs a few functions. The problem is that when I run the program I get a java OutOfMemoryError because the table is simply too big. I know that I can select from the beginning of the table using the LIMIT operator, but is there a way I can start the selection from a certain index where I left off with the LIMIT command? Thanks!
There is offset option in Postgres as in:
select from table
offset 50
limit 50
For mysql you can use the follwoing approaches:
SELECT * FROM table LIMIT {offset}, row_count
SELECT * FROM table WHERE id > {max_id_from_the previous_selection} LIMIT row_count. First max_id_from_the previous_selection = 0.
This is actually something that the jdbc driver will handle for you transparently. You can actually stream the result set instead of loading it all into memory at once. To do this in MySQL, you need to follow the instructions here: http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html
Basically when you create you call connection.prepareStatement you need to pass ResultSet.TYPE_FORWARD_ONLY and ResultSet.CONCUR_READ_ONLY as the second and third parameters, then call setFetchSize(Integer.MIN_VALUE) on your PreparedStatement object.
There are similar instructions for doing this with other databases which I could iterate if needed.
EDIT: now we know you need instructions for PostgreSQL. Follow the instructions here: How to read all rows from huge table?