How to check if filenames in specific folder are inside database table? - sql

Start situation:
a folder with lots of files (images mostly png).
two tables in database (MariaDB) which contains supposed image filenames. I query the filenames like this:
select filename from table1
UNION
select filename from table2;
I want to know if I have files not registered in the database tables.
My first approach is to put the list of filenames inside a textfile (I've used Linux command line, list is filename per line), but I don't know how to continue.
I can't write in the database. UPDATED. I got more auth to perform my job tasks. Therefore I solved this with the suggestion.

You need to do couple of things:
Import the file with list of files into the database.
Use cursor to go through the list of file names and match it to your table list which contains the list-b of file names
To import file name you can use the import method or directly read the file as a table virtually.
Thanks

Related

how to read a tab delimited .txt file and insert into oracle table

I want to read a tab delimited file using PLSQL and insert the file data into a table.
Everyday new file will be generated.
I am not sure if external table will help here because filename will be changed based on date.
Filename: SPRReadResponse_YYYYMMDD.txt
Below is the sample file data.
Option that works on your own PC is to use SQL*Loader. As file name changes every day, you'd use your operating system's batch script (on MS Windows, these are .BAT files) to pass a different name while calling sqlldr (and the control file).
External table requires you to have access to the database server and have (at least) read privilege on its directory which contains those .TXT files. Unless you're a DBA, you'll have to talk to them to provide environment. As of changing file name, you could use alter table ... location which is rather inconvenient.
If you want to have control over it, use UTL_FILE; yes, you still need to have access to that directory on the database server, but - writing a PL/SQL script, you can modify whatever you want, including file name.
Or, a simpler option, first rename input file to SPRReadResponse.txt, then load it and save yourself of all that trouble.

Trying to create a table and load data into same table using Databricks and SQL

I Googled for a solution to create a table, using Databticks and Azure SQL Server, and load data into this same table. I found some sample code online, which seems pretty straightforward, but apparently there is an issue somewhere. Here is my code.
CREATE TABLE MyTable
USING org.apache.spark.sql.jdbc
OPTIONS (
url "jdbc:sqlserver://server_name_here.database.windows.net:1433;database = db_name_here",
user "u_name",
password "p_wd",
dbtable "MyTable"
);
Now, here is my error.
Error in SQL statement: SQLServerException: Invalid object name 'MyTable'.
My password, unfortunately, has spaces in it. That could be the problem, perhaps, but I don't think so.
Basically, I would like to get this to recursively loop through files in a folder and sub-folders, and load data from files with a string pattern, like 'ABC*', and load recursively all these files into a table. The blocker, here, is that I need the file name loaded into a field as well. So, I want to load data from MANY files, into 4 fields of actual data, and 1 field that captures the file name. The only way I can distinguish the different data sets is with the file name. Is this possible? Or, is this an exercise in futility?
my suggestion is to use the Azure SQL Spark library, as also mentioned in documentation:
https://docs.databricks.com/spark/latest/data-sources/sql-databases-azure.html#connect-to-spark-using-this-library
The 'Bulk Copy' is what you want to use to have good performances. Just load your file into a DataFrame and bulk copy it to Azure SQL
https://docs.databricks.com/data/data-sources/sql-databases-azure.html#bulk-copy-to-azure-sql-database-or-sql-server
To read files from subfolders, answer is here:
How to import multiple csv files in a single load?
I finally, finally, finally got this working.
val myDFCsv = spark.read.format("csv")
.option("sep","|")
.option("inferSchema","true")
.option("header","false")
.load("mnt/rawdata/2019/01/01/client/ABC*.gz")
myDFCsv.show()
myDFCsv.count()
Thanks for a point in the right direction mauridb!!

Query for finding all occurrences of a string in a database

I'm trying to find a specific string on my database. I'm currently using FlameRobin to open the FDB file, but this software doesn't seems to have a properly feature for this task.
I tried the following SQL query but i didn't work:
SELECT
*
FROM
*
WHERE
* LIKE '126278'
After all, what is the best solution to do that? Thanks in advance.
You can't do such thing. But you can convert your FDB file to a text file like CSV so you can search for your string in all the tables/files at the same time.
1. Download a database converter
First step you need a software to convert you databse file. I recommend using Full Convert to do it. Just get the free trial and download it. It is really easy to use and it will export each table in a different CSV file.
2. Find your string in multiple files at the same time
For that task you can use the Find in files feature of Notepad++ to search the string in all CSV files located at the same folder.
3. Open the desired table on FlameRobin
When Notepad++ highlight the string, it shows in what file it is located and the number of the line. Full Convert saves each CSV with the same name as the original table, so you can find it easily whatever database manager software you are using.
Here is Firebird documentation: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25.html
You need to read about
Stored Procedures of "selectable" kind,
execute statement command, including for execute statement variant
system tables, having "relation" in names.
Then in your SP you do enumerate all the tables, then you do enumerate all the columns in those tables, then for every of them you run a usual
select 'tablename', 'columnname', columnname
from tablename
where columnname containing '12345'
over every field of every table.
But practically speaking, it most probably would be better to avoid SQL commands and just to extract ALL the database into a long SQL script and open that script in Notepad (or any other text editor) and there search for the string you need.

inserting multiple text files

I have 4 different text files each file with different name and different column in it place in one folder. I want these 4 four files to be inserted or updated into 4 different existing tables. So How to read read these 4 files dynamically and insert them into their respective table dynamically in SSIS.
Well, you need to use Data Flow Task to move data from a Flat File Source to a Table Destination (OLEDB Destination perhaps). Are the columns in your file delimited in any way? For example, with any of these: (;),(|) or something like that? if it is, you can create a FlatFileConnectionManager and set that to split the columns. If not, you might need to use the FixedWidth option to separate your columns. To use the OLEDB Destination, you will need to create a OLEDB connectionManager to point to the table in your database. I could help you more if I had more information about the files you want to read the data from.
EDIT
Well you said at the start you were working with 4 files and 4 tables, so you can create 4 Flat Destination sourcers with 4 OLEDB destinations aswell (1 of each for each flat file). If I understood you correctly, these 4 files can or cannot exist yet. So if you know the names that the files will get, change the Package Property DelayValidation to true, and then create a connection with a sample text file. You do this so the File path gets saved. The tables, in my opinion DO need to exist. Now, when you said:
i want to load all the text files into each different existing table whenever there is files inside the folder.
The only way I know you can do something similar, is to schedule the execution of your package at a certain time with SQL Server Agent Job. Please let me know if this was what you were looking for.

Dynamically populate external tables location

I'm trying to use oracle external tables to load flat files into a database but I'm having a bit of an issue with the location clause. The files we receive are appended with several pieces of information including the date so I was hoping to use wildcards in the location clause but it doesn't look like I'm able to.
I think I'm right in assuming I'm unable to use wildcards, does anyone have a suggestion on how I can accomplish this without writing large amounts of code per external table?
Current thoughts:
The only way I can think of doing it at the moment is to have a shell watcher script and parameter table. User can specify: input directory, file mask, external table etc. Then when a file is found in the directory, the shell script generates a list of files found with the file mask. For each file found issue a alter table command to change the location on the given external table to that file and launch the rest of the pl/sql associated with that file. This can be repeated for each file found with the file mask. I guess the benefit to this is I could also add the date to the end of the log and bad files after each run.
I'll post the solution I went with in the end which appears to be the only way.
I have a file watcher than looks for files in a given input dir with a certain file mask. The lookup table also includes the name of the external table. I then simply issue an alter table on the external table with the list of new file names.
For me this wasn't much of an issue as I'm already using shell for most of the file watching and file manipulation. Hopefully this saves someone searching for ages for a solution.