Teradata SQL Assistant - How can I pivot or transpose large tables with many columns and many rows? - sql

I am using Teradata SQL Assistant Version TD 16.10.06.01 ...
I have seen a lot people transpose data for set smallish tables but I am working on thousands of clients and need the break the columns up into Line Item Values to compare orders/highlight differences between orders. Problem is it is all horizontally linked and I need to transpose it to Id,Transaction id,Version and Line Item Value 1, Line Item Value 2... then another column comparing values to see if they changed.
example:
+----+------------+-----------+------------+----------------+--------+----------+----------+------+-------------+
| Id | First Name | Last Name | DOB | transaction id | Make | Location | Postcode | Year | Price |
+----+------------+-----------+------------+----------------+--------+----------+----------+------+-------------+
| 1 | John | Smith | 15/11/2001 | 1654654 | Audi | NSW | 2222 | 2019 | $ 10,000.00 |
| 2 | Mark | White | 11/02/2002 | 1661200 | BMW | WA | 8888 | 2016 | $ 8,999.00 |
| 3 | Bob | Grey | 10/05/2002 | 1667746 | Ford | QLD | 9999 | 2013 | $ 3,000.00 |
| 4 | Phil | Faux | 6/08/2002 | 1674292 | Holden | SA | 1111 | 2000 | $ 5,800.00 |
+----+------------+-----------+------------+----------------+--------+----------+----------+------+-------------+
hoping to change the data to :
+----+----------+----------+----------+----------------+----------+----------+----------------+---------+-----+
| id | trans_id | Vers_ord | Item Val | Ln_Itm_Dscrptn | Org_Val | Updt_Val | Amndd_Ord_chck | Lbl_Rnk | ... |
+----+----------+----------+----------+----------------+----------+----------+----------------+---------+-----+
| 1 | 1654654 | 2 | 11169 | Make | Audi BLK | Audi WHT | Yes | 1 | |
| 1 | 1654654 | 2 | 11189 | Location | NSW | WA | Yes | 2 | |
| 1 | 1654654 | 2 | 23689 | Postcode | 2222 | 6000 | Yes | 3 | |
+----+----------+----------+----------+----------------+----------+----------+----------------+---------+-----+
Recently with smaller data I created a table added in Values then used a case statement when value 1 then xyz with a product join ... and the data warehouse admins didn't mention anything out of order. but I only had row 16 by 200 column table to transpose ( Sum, Avg, Count, Median(function) x 4 subsets of clients) , which were significantly smaller than my current tables to make comparisons with.
I am worried my prior method will probably slow the data Warehouse down, plus take me significant amount of time to type the SQL.
Is there a better way to transpose large tables?

Related

Compare 2 data frames with a tolerance

I have 2 data frames, each containing 4 columns and hundreds of rows. Although the columns are the same, the rows may appear in any order. I need to reconcile these data frames and ensure that everything in DF1 is also in DF2 and vice versa, and highlight any rows that do not appear in both data frames.
However, the final column, "Net Price", will sometimes not be the exact same figure in each data frame. I need a tolerance on it of $1. that is, Net Cash can differ by $1 and still be accepted. I have attached 2 simplified data frames.
DF_Client
| Client | Broker | Stock | Qty | Net Cash |
| John | Bank 1 | FMG | 500 | 32,456.4532 |
| Charlie | Bank 2 | CBA | 1,200 | 37,783.3740 |
| Paul | Bank 3 | TLS | 780 | 210,237.9830 |
| Richard | Bank 3 | WOW | 4,921 | 25,119.9952 |
| John | Bank 1 | FMG | 1,595 | 545.8500 |
DF_Broker
| Client | Broker | Stock | Qty | Net Cash |
| Richard | Bank 3 | WOW | 4,921 | 25,119.8603 |
| John | Bank 1 | FMG | 1,595 | 546.0000 |
| Charlie | Bank 2 | CBA | 1,200 | 37,783.5892 |
| Paul | Bank 3 | TLS | 780 | 210,237.7521 |
| John | Bank 1 | FMG | 500 | 32,456.6600 |
I have tried to merge, however, because the net cash column is not always exact, it will not match them.
df_compare = pd.merge(df_Client, df_Broker, how="outer", indicator="Missing")

Designing a database for a workout tracker

I'm designing a database for a workout tracker app. Each user should be able to track multiple workouts (routines). A workout can have multiple exercises an exercise can be used in many workouts. Each exercise will have a specific track type (weight and reps, distance and time, only reps).
My tables so far:
| User | |
|------|-------|
| id | name |
| 1 | Ilka |
| 2 | James |
| Exercise | | |
|----------|---------------------|---------------|
| id | name | track_type_id |
| 1 | Barbell Bench Press | 1 |
| 2 | Squats | 1 |
| 3 | Deadlifts | 1 |
| 4 | Rowing Machine | 3 |
| Workout | | |
|---------|---------|-----------------|
| id | user_id | name |
| 1 | 1 | Chest & Triceps |
| 2 | 1 | Legs |
| Workout_Exerice (Junction table) | |
|-----------------|------------------|------------|
| id | exersice_id | workout_id |
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 4 | 1 |
| Workout_Sets | | | |
|--------------|---------------------|------|--------|
| id | workout_exersice_id | reps | weight |
| 1 | 1 | 12 | 120 |
| 2 | 1 | 10 | 120 |
| 3 | 1 | 8 | 120 |
| 4 | 2 | 10 | 220 |
| 5 | 3 | null | null |
| TrackType | |
|-----------|-----------------|
| id | name |
| 1 | Weight and Reps |
| 2 | Reps Only |
| 3 | Distance Time |
My issue is how to incorporate the TrackType table for each workout set, my first option was to create columns in the Workout_Sets table for each tracking type (weight and reps, distance and time, only reps) but that means for many rows I will have many nulls. Another option I thought was to use an EAV type table but I'm not sure. Also do you think my design is efficient (Over-normalization)?
I would say that the most efficient way is to have nulls in your table. The alternative would require you to split many of the category's into separate tables. Also a recommendation is that you start factoring a User ID table into your database
Your description states that “Each exercise will have a specific track type” suggesting a one-to-one relationship between Exercise and TrackType, and that the relationship is unchanging. As such, the exercise table should have a TrackType column.
I suspect, however, that your problem description may be lacking specificity, making it difficult to give you sound advice. For instance, if the TrackType can vary for any given exercise, your TrackType column may belong on the Workout_Sets table. If the relationship between TrackType and Exercise/Workout_Sets is many-to-many, then you will need another junction table.
Your question regarding “over-normalization” depends upon many factors that are specific to your solution. In general, I would say no - the degree of normalization appears to be appropriate.

Transform table from sequential identifier to real with attributes

I changed a but the context, but it's basically the same issue.
Imagine we are in a never-ending tunnel, shaped like a circle. We split every section of the circle, from 1 to 10 and we'll call each section slot (sl). There are 2 groups (gr) of living things walking in the tunnel. Each group has 2 bands, where each has a name and global hitpoints (hp). Every group is walking forward (although the bands might change order). If a group is at slot #10 and moves forward, he will be at slot #1. We snapshot their information every day. All the data gathered is stored in a table with this structure:
+----------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+--+
| day_id | | gr_1_sl_1_id | | gr_1_sl_1_name | | gr_1_sl_1_hp | | gr_1_sl_2_id | | gr_1_sl_2_name | | gr_1_sl_2_hp | | gr_2_sl_1_id | | gr_2_sl_1_name | | gr_2_sl_1_hp | | gr_2_sl_2_id | | gr_2_sl_2_name | | gr_2_sl_2_hp | |
+----------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+--+
| 1 | 3 | orc | 100 | 4 | goblin | 10 | 10 | human | 50 | 1 | dwarf | 25 | |
| 2 | 6 | goblin | 7 | 7 | orc | 76 | 2 | human | 60 | 3 | dwarf | 28 | |
+----------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+--+
As you can see, the columns are structured in a sequential way, while the data shows what is the actual value. What I want is to have the information shaped this way instead:
+---------+-------+-------+-----------+---------+
| id_game | gr_id | sl_id | band_name | band_hp |
+---------+-------+-------+-----------+---------+
| 1 | 1 | 3 | orc | 100 |
| 1 | 1 | 4 | goblin | 10 |
| 1 | 2 | 10 | human | 50 |
| 1 | 2 | 1 | dwarf | 25 |
| 2 | 1 | 6 | goblin | 7 |
| 2 | 1 | 7 | orc | 76 |
| 2 | 2 | 2 | human | 60 |
| 2 | 2 | 3 | dwarf | 28 |
+---------+-------+-------+-----------+---------+
I have this information in power bi, although I can create views in sql server if need be. I have tried many things, closest thing I got was unpivoting and parsing the original columns to get day_id, gr_id, sl_id, attributes and values. In attributes and values, it's basically name and hp with their corresponding value (I changed hp into string), but then I'm stocked, I'm not sure what to do next.
Anyone has any ideas ? Keep in mind that I oversimplified the problem; there are more groups, more slots, more bands and more statistics (i.e. attack and defense rating, etc.)
You seem to want to unpivot the table. In SQL Server, I recommend using apply:
select t.day_id, v.*
form t cross apply
(values (1, 1, gr_1_sl_1_id, gr_1_sl_1_name, gr_1_sl_1_hp),
(1, 2, gr_1_sl_2_id, gr_1_sl_2_name, gr_1_sl_2_hp),
(2, 1, gr_2_sl_1_id, gr_1_sl_1_name, gr_2_sl_1_hp),
(2, 2, gr_2_sl_2_id, gr_1_sl_2_name, gr_2_sl_2_hp)
) v(id_game, gr_id, sl_id, band_name, band_hp);
In other databases, you can do something similar with union all.

Outer Join multible tables keeping all rows in common colums

I'm quite new to SQL - hope you can help:
I have several tables that all have 3 columns in common: ObjNo, Date(year-month), Product.
Each table has 1 other column, that represents an economic value (sales, count, netsales, plan ..)
I need to join all tables on the 3 common columns giving. The outcome must have one row for each existing combination of the 3 common columns. Not every combination exists in every table.
If I do full outer joins, I get ObjNo, Date, etc. for each table, but only need them once.
How can I achieve this?
+--------------+-------+--------+---------+-----------+
| tblCount | | | | |
+--------------+-------+--------+---------+-----------+
| | ObjNo | Date | Product | count |
| | 1 | 201601 | Snacks | 22 |
| | 2 | 201602 | Coffee | 23 |
| | 4 | 201605 | Tea | 30 |
| | | | | |
| tblSalesPlan | | | | |
| | ObjNo | Date | Product | salesplan |
| | 1 | 201601 | Beer | 2000 |
| | 2 | 201602 | Sancks | 2000 |
| | 5 | 201605 | Tea | 2000 |
| | | | | |
| | | | | |
| tblSales | | | | |
| | ObjNo | Date | Product | Sales |
| | 1 | 201601 | Beer | 1000 |
| | 2 | 201602 | Coffee | 2000 |
| | 3 | 201603 | Tea | 3000 |
+--------------+-------+--------+---------+-----------+
Thx
Devon
It sounds like you're using SELECT * FROM... which is giving you every field from every table. You probably only want to get the values from one table, so you should be explicit about which fields you want to include in the results.
If you're not sure which table is going to have a record for each case (i.e. there is not guaranteed to be a record in any particular table) you can use the COALESCE function to get the first non-null value in each case.
SELECT COALESCE(tbl1.ObjNo, tbl2.ObjNo, tbl3.ObjNo) AS ObjNo, ....
tbl1.Sales, tbl2.Count, tbl3.Netsales

Turn Parts of Rows into a Separate Column in SQL Server

I'm not sure if Pivot is the way to go with this, but I am looking to take part of a row and create a new column with it.
This is my example:
+--------+------------+--------+
| Person | PetName | PetAge |
+--------+------------+--------+
| 1 | Apple | 2 |
| 1 | Banana | 6 |
| 1 | Grapefruit | 3 |
| 2 | Red | 53 |
| 2 | Blue | 8 |
+--------+------------+--------+
This is my result/goal:
+--------+---------+--------+---------+--------+------------+--------+
| Person | PetName | PetAge | PetName | PetAge | PetName | PetAge |
+--------+---------+--------+---------+--------+------------+--------+
| 1 | Apple | 2 | Banana | 6 | Grapefruit | 3 |
| 2 | Red | 53 | Blue | 8 | | |
+--------+---------+--------+---------+--------+------------+--------+
How can I get the result from my example?
UPDATE: I just noticed that your table just had the Person in the first row.
I've done something similar. What I did was add a RowNumber per pet by person (OVER PARTITION BY PERSON) to the data. This will allow the data to be broken up and an order of numbers for each pet per person.
Make your normal table with just the PetName and PetAge.
Add a Tablix with just one column and row and put the previous table in it.
For the Column grouping, use ROW_NUM. For Row use Person.