Creation of DB Structure for particular Excel Sheet formats

Creation of DB Structure for particular Excel Sheet formats - sql

I have basic knowledge of SQL queries. 
Problem Statement:
Every month I will get an Excel sheet with transaction name as one column and the rest of n columns will be dates. The number of rows(Transaction names) are fixed but the dates of a month might vary. I have also attached the screenshot of the excel file.
Excel table
How should I create a table and structure it in the DB?
How should I import this particular excel table with values in the SQL DB?
Kindly please help me!
Thanks.

You can create table like this format.
Create table table_name(Transaction varchar(40),dates date,values float)
This Table looks like:
Transaction | dates | values
billing_ 1 | 11-oct-2019 | 1.2006
billing_ 1 | 12-oct-2019 | 2.2006
billing_ 2 | 11-oct-2019 | 1.2006
billing_ 2 | 12-oct-2019 | 2.2006
For importing excel data.
First you need to format this type then you can easily import into the database.

Related

Transpose/Pivot Excel file in Pentaho (using multiple files)

I've been having some trouble with the following situation: There's an Excel file I need to use which has the information in the following format:
ColumnA | ColumnB
Name | John
Business | Pentaho
Address | Evergreen 123
Job type | Food processing
NameBoss | Boss lv1
Phone | 555-NoPhone
Mail | thisATmail
What I need to do is get all column A as different columns, ending with 7 different columns, each one with one value, which is the data in column B. Additionally, the integration is reading the filename as an extra output field:
SELECT
'${FILES_ROOT}/proyectos/BUSINESS_NAME/B_NAME_OPER/archivos_fuente/NÓMINA BAC - ' ||nombre_empresa||'.xlsx' as nombre_archivo
--, nombre_empresa
FROM "public".maestro_empresa
The transformation for the Excel file I have it as this:
As can bee seen, in the fields tab of the transformation, added manually each column, since the data in the Excel file does not has headers.
With this done, I am not sure how to proceed from here in order to get the transposed data I need. What can I do?
End result I am looking forward is something like this:
Name | Business | Address | Job type | NameBoss | Phone | Mail | excel_name
John | Pentaho | Evergreen 123 | Food processing | Boss lv1 | 555-NoPhone | thisAtMail | ExcelName.xlsx

With step 'Row demoralizer', you can do this easily. AT first you need to take input from excel file -> you need to use 'Row demoralizer' step. You can see sample from HERE.
Note: Remove ''Id'' column from my sample if you always suppose to get one line.
If you ColumnA values are dynamic /not specific . You can use THIS Metadata Injection sample ( where you need to take same excel input twice. But not require to specify column name). Please run transformation "MetaDataInjectionPV.ktr"

Need Column data to be the ROW header for my query

I am trying to use a LATERAL JOIN on a particular data set however i cannot seem to get the syntax correct for the query.
What am i trying to achieve:
Take the first column in the dataset (See picture) and use that as the Table headers (rows) and populate the rows with the data from the StringValue column
Currently it appears like this:
cfname | stringvalue |
----------------------------------------
customerrequesttype | newformsubmission|
Assignmentgroup | ITDEPT |
and I would like to have it appear as this:
customerrequesttype| Assignmentgroup|
-------------------------------------
newformsubmission | ITDEPT
As mentioned i am very new to SQL i know limited basics

Transpose variable number of rows into columns in OpenRefine

I have an xml file containing records from a library catalogue. I have imported it into OpenRefine but all the values are in one column. I want to transpose it so each field in the record has its own column. However, this is complicated by the fact that a) each field is optional so does not exist in all records and b) many fields are repeatable so can appear multiple times in each record. Here's a simplified example of what the data looks like:
| RecordID | Tag | Data |
| 1 | 040a | CaABCD |
| 1 | 245a | Go fish |
| 1 | 245a | A guide to fish |
| 1 | 246i | Fish series |
| 1 | 260a | Fishing friends |
| 2 | 040a | CaABDC |
| 2 | 245a | Happy trails |
| 2 | 246i | Hiking series |
| 2 | 260i | The happy hiker |
| 2 | 500a | Notes |
I have read the Q&A here Openrefine - Transpose rows into columns based on text but the problem with this solution is that if I concatenate all the values together I have no way to be sure what field they belong in anymore, as my data is much more complicated than the data in that question (my actual data has 25+ fields and many thousands of records).
I was able to get closer using Google Sheets and making a pivot table with a calculated field (as in PivotTable to show values, not sum of values - see the answer at the very bottom). However, I still don't know how to handle the repeating fields. In the pivot table the multiple values are there but only the first displays (double-clicking on an individual cell brings up a details table which lists all the values), so when I copy-paste the table I lose the additional values. I would like to concatenate them but I cannot see a way to do so within the pivot table.
Can you think of any other way I could do this, in OpenRefine or another tool? Thanks!

The classic way to fix this in OpenRefine is to use "Transpose -> Columnize by key value". But this feature is poorly documented and can cause headaches even for OpenRefine developers. In your case, repeated fields will be problematic, so here is a possible solution.
1° Go to the "tag" column, click on "Transpose -> Columnize by key value" and use the following configuration (don't forget the "Note column (optional)")
The result will look like this (my dataset is not exactly the same as yours, I modified a value to do some test)
2° In the new column "Record ID: 040 a", click on "edit column -> Move Column To Beginning".
3° If you want to merge the repeated fields, go to each column that contains them and click on "Edit Cells -> Join Multi Value cells" by choosing a separator, for example "|".
The end result will look like this.
To get rid of unnecessary columns: Click on Export -> Custom tabular export and deselect the columns whose name starts with RecordId.

OpenRefine also has a native MARC importer which might be something worth trying if you need to work with MARC data in the future. MARCEdit also has some specific OpenRefine support built in.

Export varbinary to file (image) from multiple rows

I have a MSSQL 2k8 database, in it I have a table of format below.
Employee Number | Segment | Data (varbinary(8000))
----------------------------------------------------------
111111 | 1 | 0x01234567...DEF
111111 | 2 | 0x01234567...DEF
111111 | 3 | 0x01234567...DEF
The data (varbinary) column makes up a picture but unfortunately is split in multiple segments by a process I cannot control.
Is there a way to export this data via an SQL script/procedure to a file? I have seem some questions that answer for a varbinary(max) column but I can't for the life of me work out how to stitch these all together into one file.
Note: Some of the files have >500 segments but this procedure will not be occuring exceedingly regularly.

If the picture can be reconstructed by simply concatenating all of the segments, then you could try execsql.py, which is a SQL script processor written in Python (by me). It has a metacommand of this form:
EXPORT <table_or_view> TO <filename> AS RAW
which will concatenate all columns and rows in the given table or view.

Flat File Import: Remove Data

(Posted a similar question earlier but HR department changed conditions today)
Our HR department has an automated export from our SAP system in the form of a flat file. The information in the flat file looks like so.
G/L Account 4544000 Recruiting/Job Search
Company Code 0020
--------------------------
| Posting Date| LC amnt|
|------------------------|
| 01/01/2013 | 406.25 |
| 02/01/2013 | 283.33 |
| 03/21/2013 |1,517.18 |
--------------------------
G/L Account 4544000 Recruiting/Job Search
Company Code 0020
--------------------------
| Posting Date| LC amnt|
|------------------------|
| 05/01/2013 | 406.25 |
| 06/01/2013 | 283.33 |
| 07/21/2013 |1,517.18 |
--------------------------
When I look at the data in the SSIS Flat File Source Connection all of the information is in a single column. I have tried to use the Delimiter set to Pipe but it will not separate the data, I assume due to the nonessential information at the top and middle of the file.
I need to remove the data at the top and middle and then have the Date and Total split into two separate columns.
The goal of this is to separate the data so that I can get a single SUM for the running year.
Year Total
2013 $5123.25
I have tried to do this in SSIS but I cant seem to separate the columns or remove the data. I want to avoid a script task as I am not familiar with the code or operation of that component.
Any assistance would be appreciated.

I would create a temp table that can import the whole flat file, after that do filter on SQL level
An example
Create TABLE tmp (txtline VARCHAR(MAX))
BCP or SSIS file into tmp table
Run Query like this to get result ( you may need adjust string length to fit your flat file)
WITH cte AS (
SELECT
CAST(SUBSTRING(txtline,2,10) AS DATE) AS PostingDate,
CAST(REPLACE(REPLACE(SUBSTRING(txtline,15,100),'|',''),',','') AS NUMERIC(19,4)) AS LCAmount
FROM tmp
WHERE ISDATE(SUBSTRING(txtline,2,10)) = 1
)
SELECT
YEAR(PostingDate),
SUM(LCAmount)
FROM cte
GROUP BY YEAR(PostingDate)

maybe you could use MS-Excel to open the flat file, using pipe-character as the delimeter, and then create a CSV from that, if needed.

Short of a script task/component (or a full-blown custom SSIS component), I don't think you'll be able to parse that specific format in SSIS. The Flat File Connection Manager does allow you to select how many rows of your text file are headers to be skipped, but the format you're showing has multiple sections (and thus multiple headers). There's also the issue of the horizontal lines, which the Flat File Connection won't be able to properly handle.
I'd first see if there's any way to get a normal CSV file with this data out of SAP. If that turns out to be impossible, then you'll need some sort of custom code to strip out the excess text.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Creation of DB Structure for particular Excel Sheet formats - sql

Related

Transpose/Pivot Excel file in Pentaho (using multiple files)

Need Column data to be the ROW header for my query

Transpose variable number of rows into columns in OpenRefine

Export varbinary to file (image) from multiple rows

Flat File Import: Remove Data

Categories

Resources