Generate a new row for a set of fields of the input row (and generate a query for each new row) - pentaho

We have a .csv file that has information about the migration flows of people across districts in a city.
We are creating a transformation that loads data from a .csv file to a database (2 tables):
each row has the following information:
- field 1: Name of the origin district
- field 2 (name of the field = name of the destination district): Value of the field = number of people that have changed from origin district to this destination district
This repeats for each destination district.
Suppose there are 20 districts so the total number of fields is 21
We want a step that generates the following output (transform data structure):
A new row with the following structure:
Field 1: Name of the origin district
Field 2: Name of the destination district
Field 3: Number of people that has changed from district "Field 1" to district "Field 2"
So the output of this step must contain 20x20 rows.We will then insert the 400 rows in the following database table:
We can not find any transformation step that can generate this new data structure. We will try the javascript step to manually implement a loop for each origin district and then generate the insert into the database table for each new row.

To move columns from where they are listen in columns in one row (pivoted tables), to have one row pr. column, and a key column you should use Row Normaliser.

Related

Merge tables in Power BI

I have a problem creating a table in Power BI/SQL
Basically, i have a CSV file with a dataset of the crime reports from a designated year.
Each crime has date (in 3 columns, day, month and year), location (in coordinates lat and long), type of crime, neighborhood and so on.
To make things less "dense" i created a few tables (Like for example, a "Location_ID" table with a PK and a combined Lat and Long for each ID), same as for Dates, Types of crime, neighborhood, etc.
The thing is that now i have my main table empty, and need to "replace" each of the data with the aforemention PK from each new table created. For example, i have the crime N°121 which happened in Buenos Aires, Argentina (Thats "3" in the New table ID_LOCATION), at 4/3/2022 (Thats "Z" in the New Table ID_DATE) and so on.
I dont know how to reassing every data in the column with the correct new value from the tables that i created without doing it manually (they are over 80k entrys, would take forever).
Thanks in advance

CSV/XLXS into SQL Table : Best Way based on the format

I have the following case
I have the following csv file format example:
Year;Ligue1;Ligue2;Ligue3
2017;Manchester;Burnley;Doncaster
2016;Chelsea;Aston Villa;Leeds
2015;Arsenal;Newcastle;Sheffield
What I would like to create so far is a table/view with rows just for each year and each Ligue.
For instance : Year : 2017 ; Ligue : League 1 ; Team : Manchester
My idea is the following.
To bulk the CSV file into the database : Bulk into ...
Once the data is loaded I would iterate through all the records from first column (Ligue1 ) until the last column (Ligue 3) and insert these records into a specific view depending on the league classification.
For instance i will create the following view:
View Football with just 3 columns : Year , League, Winner
Insert into Football (Year, League = always the name of the Column (League 1, League 2 or League 3) Team ) where Team is the relevant Winner for the specific year.
Final Result Example:
Year;League;Winner
2017;League1;Manchester
2017;League2;Burnley
2017;League3;Doncaster
Would be the idea of bulking the CSV the best approach?
How could I get the results and cases described in the second step? With Counters / Cursors...?

Pig Latin: using field in one table as position value to access data in another table

Let's say we have two tables. The first table has following description:
animal_count: {zoo_name:chararray, counts:()}
The meaning of "zoo_name" fields is obvious. "counts" fields contains counts of each specific animal species. In order to know what exact species a given field in "counts" tuple represents, we use another table:
species_position : {species:chararray, position:int}
Let assume we have following data in "species_position" table:
"tiger", 0
"elephant", 1
"lion", 2
This data means the first field in animal_count.counts is the number of tigers in a given zoo. The second field in that tuple is the number of elephants, and so on. So, if we want to represent that fact that "san diego zoo" has 2 tigers, 4 elephants and no lion, we will have following data in "animal_count" table:
"san diego zoo", (2, 4, 0)
Given this setup, how can I write a query to extract the number of a given species in all zoos? I was hoping for something like:
FOREACH species_position GENERATE species, animal_count.counts.$position;
Of course, the "animal_count.counts.$position" won't work.
Is this possible without resorting to UDF?

Multi-Row Per Record SQL Statement

I'm not sure this is possible but my manager wants me to do it...
Using the below picture as a reference, is it possible to retrieve a group of records, where each record has 2 rows of columns?
So columns: Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated would be part of the first row and column: Work Notes would be a new row that spans the width of the report. Each record would have two rows. Is this possible with a GROUP BY statement?
Record 1
Row 1 = Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated
Row 2 = Work Notes
Record 2
Row 1 = Number, Incident Number, Vendor Number, Customer Name, Customer Location, Status, Opened and Updated
Row 2 = Work Notes
Record n
...
I don't think that possible with the built in report engine. You'll need to export the data and format it using something else.
You could have something similar to what you want on short description (list report, group by short description), but you can't group by work notes so that's out.
One thing to note is that the work_notes field is not actually a field on the table, the work_notes field is of type journal_input, which means it's really just a gateway to the actual underlying data model. "Modifying" work_notes actually just inserts into sys_journal_field.
sys_journal_field is the table which stores the work notes you're looking for. Given a sys_id of an incident record, this URL will give you all journal field entries for that particular record:
/sys_journal_field_list.do?sysparm_query=name=task^element_id=<YOUR_SYS_ID>
You will notice this includes ALL journal fields (comments + work_notes + anything else), so if you just wanted work notes, you could simply add a query against element thusly:
/sys_journal_field_list.do?sysparm_query=name=task^element=work_notes^element_id=<YOUR_SYS_ID>
What this means for you!
While you can't separate a physical row into multiple logical rows in the UI, in the case of journal fields you can join your target table against the sys_journal_field table using a Database View. This deviates from your goal in that you wouldn't get a single row for all work notes, but rather an additional row for each matched work note.
Given an incident INC123 with 3 work notes, your report against the Database View would look kind of like this:
Row 1: INT123 | markmilly | This is a test incident |
Row 2: INT123 | | | Work note #1
Row 3: INT123 | | | Work note #2
Row 4: INT123 | | | Work note #3

SQL - updating a table using a stored procedure

I have a table of zip codes and a stored procedure to calculate all zipcodes within an X radius, given a zip code and a radius.
For example, to find all zip codes within 200 miles of 10001 I'd enterCALL zip(10001,200) and it would display each zip code.
In a new column "hradius", I would like to have all zip codes within 200 miles of that row's zip code.
I'm very new to SQL, thank you for any help.
Don't shove a string with multiple values into one field. Create a related table to link one zip code to multiple:
ZipOrigin ZipDest Distance
12345 23456 150
12345 34567 175
...
(Distance is optional - for example you could use it to find all zip codes within ANY radius less than X)
In this situation, if you want to pre-generate your list of matches, you're much better off using a separate table for the matches. You'll have two tables: one for your zip codes and one for the matches. The second table will have two columns, one for the source zip code and one for the matching zip code within X miles (200 in this case). There will be a separate row for each match. The results from the stored procedure should output to the second table. Once you have that you can use a query like the following:
SELECT zip.zipcode, zipJoin.zipcode
FROM zipCodes zip
INNER JOIN zipCodeMatches zipJoin
ON zip.zipcode = zipJoin.sourceZipCode
WHERE zip.zipcode = #zip
You should spend some time learning about proper table design and normalization and how to join tables together to help you understand these concepts.