Spring Batch SQL Query with in clause - sql

I am new to Spring batch.
I am currently developing a test project to learn Spring batch and I have run into an issue.
My requirement is that I need to query my oracle database to find the ids from one table and then pass those ids and get the details from other table for those ids. Currently I have roughly 300 ids.
I can get the ids but I am not sure how to pass those Ids in the Sql query in clause all at once to get the other fields which are stored in different table.
I am also open to other suggestions to solve this issue.
Thanks,
Nik

I can get the ids but I am not sure how to pass those Ids in the Sql query in clause all at once to get the other fields which are stored in different table
You can create a:
first step (tasklet) that gets those IDs and puts them in the execution context
second step (chunk-oriented) that reads those IDs from the execution context and use them in the in clause of the reader's query
Passing data between steps in explained in details in the Passing Data to Future Steps section of the reference documentation.
My requirement is that I need to query my oracle database to find the ids from one table and then pass those ids and get the details from other table for those ids
I am also open to other suggestions to solve this issue.
I suggest to use a common pattern called the Driving Query Pattern because I think it is a good fit for your requirement. The idea is that the reader gets only the IDs and a processor asks for the details of each ID from other tables.
Hope this helps.

Related

How to get table/column usage statistics in Redshift

I want to find which tables/columns in Redshift remain unused in the database in order to do a clean-up.
I have been trying to parse the queries from the stl_query table, but it turns out this is a quite complex task for which I haven't found any library that I can use.
Anyone knows if this is somehow possible?
Thank you!
The column question is a tricky one. For table use information I'd look at stl_scan which records info about every table scan step performed by the system. Each of these is date-stamped so you will know when the table was "used". Just remember that system logging tables are pruned periodically and the data will go back for only a few days. So may need a process to view table use daily to get extended history.
I ponder the column question some more. One thought is that query ids will also be provided in stl_scan and this could help in identifying the columns used in the query text. For every query id that scans table_A search the query text for each column name of the table. Wouldn't be perfect but a start.

Understanding a table's structure/schema in SQL

I wanted to reach out to ask if there is a practical way of finding out a given table's structure/schema e.g.,the column names and example row data inserted into the table(like the head function in python) if you only have the table name. I have access to several tables in my current role, however, a person who developed the tables left the team I am on. I was interested in examining the tables closer via SQL Assistant in Teradata (these tables often contain often hundreds of thousands of rows hence there are issues of hitting CPU exception criteria errors).
I have tried the following select statement, but there is an issue of hitting internal CPU exception criteria limits.
SELECT top10 * FROM dbc.table1
Thank you in advance for any tips/advice!
You can use one of these commands to get table's structure details in teradata
SHOW TABLE Database_Name.Table_Name;
or
HELP TABLE Database_Name.Table_Name;
It shows the table structure details

Generate serie of queries in PENTAHO

How do I build a general query based in a list of tables and run/export the results?
Here is my conceptual structure
conceptual structure
Basically the from clause in query would be inserted by the table rows.
After that, each schema.table with true result returned must be stored in a text archive.
Could someone help me?
As pentaho doesn't support loops, it's a bit complicated. Basically you read a list of tables and copy the rows to the result and then you have a foreach-job that runs for every row in the result.
I use this setup for all jobs like this: How to process a Kettle transformation once per filename
In the linked example, it is used to process files, but it can be easily adapted to your use.

Does Tabledata.list() count towards compute usage in BigQuery?

They say there are no stupid questions, but this might be an exception.
I understand that BigQuery, being a columnar database, does a full table scan for any query over a specific column.
I also understand that query results can be cached or a named table can be created with the results of a query.
However I also see tabledata.list() in the documentation, and I'm unsure of how this fits in with query costs. Once a table is created from a query, am I free to access that table without cost through the API?
Let's say, for example, I run a query that is grouped by UserID, and I want to then present the results of that query to individual users based on that ID. As far as I understand there are two obvious ways of getting out the appropriate row for doing so.
I can write another query over the destination table with a WHERE userID=xxx clause
I can use the tabledata.list() endpoint to get all the (potentially paginated) data and get the appropriate row myself in my code
Where situation 1 would incur a query cost, and situation 2 would not? Am I getting this right?
Tabledata.list API is free as it actually does not use BigQuery Engine at all
so you are right for both 1 and 2

Unit Testing tables with SSDT

I am looking to do some row counts for a handful of tables after our deployment in our lower level environments. I have a project that deploys a DB to SQL, loads some data into it. I want to validate that the table is now populated with data. I have read the MSDN on creating unit tests but I have a few outstanding questions.
Can I only create unit tests against Stored Procs and Functions, or can I simply get a row count from a table or view and Test against that?
Can I run multiple "tests" at once? For example, if I want to get the row count for 6 tables, do I need to create a separate test for each table, or can I batch them all together?
Sorry if I missed a large part of the walk through, but the documentation on this was not very helpful pertaining to these questions.
To test a procedure or function, you simplify call that procedure or function and verify result. There is no difference between SELECT COUNT(*) FROM xxx statement and EXEC dbo.Procedure statement.
Yes. In test conditions you can specify which Result Set to verify. You can also union all row count in single query and use a checksum test condition.