How to upload multiple .csv files to PostgreSQL? - sql

I'm using pgAdmin in PostrgreSQL to work.
There are 10 .csv files with 10 different tables on my computer. All 10 tables represent the same type of data, it's just divided into 10 different months. Basically I need one table with all the months. But first I was thinking about creating 10 tables and import 10 .csv files and then combine them together into 1 single table.
I perform the following algorithm to create 1 table: create the table, create the column names, set the data types in each column, import the .csv file into the table. Then I repeat the same operation 9 times for the rest of the .csv files. Is there any way to upload all 10 files with one step? Thank you

Related

Processing CSV files with Metadata in S3 bucket in Pentaho

I have a CSV file that goes something like this:
Report Name: Stackoverflow parse data
Date of Report: 31 October, 2022
Col1, Col2, Col3,...
Data, Data, Data, ...
The values before Headers, essentially data that states what the CSV is for and when it was created (can contain multiple values, hence has dynamic number of rows), need to be removed from the CSV so I can parse it in Pentaho. Now, the CSV files are on an S3 bucket and I am fetching them using S3 CSV Input but I am not sure how to proceed with filtering the non-required data so I can successfully parse the CSV files.
You can read the complete file as a CSV with only one column, adding the rownumber to the output. Then you apply a filter to get rid of the first n rows, and then you use the Split fields step to separate the rows into columns.
You'll need more steps to transform numbers and dates into the correct format (using the Split fields you'll get strings), and maybe more operations to preformat some other columns.
Or you could create a temporal copy of with your S3 CSV file without the first n rows, and read that file instead of the original one.
Step1: In the Csv input, just adding rownumber
Step2:Use filter
Step3:Add a output component like csv or database.

OpenCSV writing to multiple files every n record

How can I write to a new file in a loop based on every Nth record? If I have 1002 records and I want to create a file every 500 records I should end up with 3 files. Currently, I am able to write all records in file one and the other two are created but none of the records are in them.

Issues Exporting from T-SQL to Excel including column names & sorting on a column that's not exported

I have a SQL table with ERP data containing over 1 million records for seven different companies. This data needs to be exported to Excel files breaking the records by company, and for each company by a date range. Based on the companies and date ranges it will produce > 100 separate files.
The results are being imported into a second ERP system which requires the import file to be Excel format and must include the column names as the first row. The records need to be sorted by a [PostDate] column which must NOT be exported to the final record set.
The client's SQL server does not have the MS Jet Engine components installed in order for me to export directly to Excel. Client will not allow install of the MS Jet Engine on this server.
I already had code from a similar project that exported to CSV files so have been attempting that. My T-SQL script loops through the table, parses out the records by company and by date range for each iteration, and uses BCP to export to CSV files. The BCP command includes the column names as the first row by using an initial SELECT 'Col_1_Name', 'Col_2_Name', etc. with a UNION SELECT [Col_1], [Col_2],...
This UNION works AS LONG AS I INCLUDE THE [PostDate] column needed for the ORDER BY in the SELECT. This exports the records I need in proper order, with column names, but also includes the [PostDate] column which the import routine will not accept. If I remove [PostDate] from the SELECT, then the UNION with ORDER BY [PostDate] fails.
We don't want to have the consultant spend the time to delete the unwanted column from each file for 100+ files.
Furthermore, one of the VarChar columns being exported ([Department]) contains rows that have a leading zero, "0999999" for example.
The user opens the CSV files by double-clicking on the file in Windows file explorer to review the data, notes the [Department] column values are ok with leading zero displayed, and then saves as Excel and closes the file to launch the import into the second ERP system. This process causes the leading zeros to be dropped from [Department] resulting in import failure.
How can I (1) export directly to Excel, (2) including column names in row 1, (3) sorting the rows by [PostDate] column, (4) excluding [PostDate] column from the exported results, and (5) preserving the leading zeros in [Department] column?
You could expand my answer to this question SSMS: Automatically save multiple result sets from same SQL script into separate tabs in Excel? by adding the sort functionality you require.
Alternatively, a better approach would be to use SSIS.

Automate Same Query (and Export) for Multiple Values

I am working with a database with 5 tables, all of which contain different sets of information about clients and their employees. If you drill down in any table for data relating to a particular client, many rows will return according to the number of their employees in the dataset.
If I were to manually run the reports I need, I would query each table, one at a time, for all results where a particular client number is specified. Then I would export each table to a .csv, and then copy those exports into the same excel workbook with 5 tabs (corresponding to the 5 tables in the SQL database). At the end I would have an individual workbook for each client.
A complicating factor is that not every client ID appears in each of the 5 tables. Preferably, I would not export empty datasets and clients with data in only three of the tables would have only three tabs in the final workbook.
Is there way of giving to SQL server a list of Client IDs, for which it should query the 5 tables, export the existing data, and (possibly / hopefully) combine in a workbook on separate tabs.
Your question is rather vague and broad, but here's the key bits of information you'll need to investigate to get things going:
Create five different datasets, each querying one table.
Create five tablixes, add a PageBreak before each tablix so in Excel they will land on different tabs.
Either set a NoRowsMessage or hide the tablix along these lines using an expression based on the RowNumber function
Create a parameter for selecting the client ID, and use that in your WHERE clause of the datasets.
The tricky bit would be how to generate multiple Excel files. SSRS does one export at a time, so your basic options:
Put multiple clients in one XLS (i.e. don't use a parameter, but include clientId as a column on the worksheets)
Have the user select one client at a time, and export one XLS at a time.
Automate generating the reports.

How to load 1 week data into SQL table from Flat file in SSIS

I want to load flat file data into SQL table on daily basic.But my table hold on 1 week data. After 7 days previous data will deleted from my table and new data will append.
Thanks
Is your flat file delimited? Commas? Tabs? How many fields are there? what types are they?
If you only want one weeks worth of data in the table at any one time, you will need to delete after you do the daily import. If you import daily but only delete once a week you will have almost two weeks worth of data in the table.
What have you tried already?
before/after you run the import, delete any data that is more than 6 days old