Pandas - Pivot and Rearrange Table With Multiple Labels in Same Header - pandas

I have an xlsx file with tabs for multiple years of data. Each tab contains a table with many columns and the table is structured like this:
| City | State | Number of Drivers, 2019 | Number of Cars, 2019 |
| LA | CA | 123 | 10.0 |
| San Diego | CA | 456 | 2345 |
I would like to rearrange the table to look like this, and do it for each tab in the xlsx:
| City | State | Year | Measure Name | Measure Value |
| LA | CA | 2019 | Number of Drivers | 123 |
| San Diego | CA | 2019 | Number of Drivers | 456 |
| LA | CA | 2019 | Number of Cars | 10 |
| San Diego | CA | 2019 | Number of Cars | 2345 |
There are a lot of moving pieces to this and has been a little tricky to get the final formatting correct.

We do melt then join with str.split
City State variable value 0 1
0 LA CA NumberofDrivers,2019 123.0 NumberofDrivers 2019
1 SanDiego CA NumberofDrivers,2019 456.0 NumberofDrivers 2019
2 LA CA NumberofCars,2019 10.0 NumberofCars 2019
3 SanDiego CA NumberofCars,2019 2345.0 NumberofCars 2019
# if you need change the name adding .rename(columns={}) at the end

This is how I wwas able to apply Yoben's solution to every tab in the xlsx file, append them together and write the full table to a .csv:
sheets_dict = pd.read_excel(r'file.xlsx', sheet_name=None)
full_table = pd.DataFrame()
for name, sheet in sheets_dict.items():
sheet['sheet'] = name
sheet = sheet.melt(['City','State'])
sheet = sheet.join(sheet.variable.str.split(',' , expand=True))
full_table = full_table.append(sheet)
full_table.reset_index(inplace=True, drop=True)
full_table.to_csv('Full Table.csv')


PowerBI / SQL Query to verify records

I am working on a PowerBI report that is grabbing information from SQL and I cannot find a way to solve my problem using PowerBI or how to write the required code. My first table, Certifications, includes a list of certifications and required trainings that must be obtained in order to have an active certification.
My second table, UserCertifications, includes a list of UserIDs, certifications, and the trainings associated with a certification.
How can I write a SQL code or PowerBI measure to tell if a user has all required trainings for a certification? ie, if UserID 1 has the A certification, how can I verify that they have the TrainingIDs of 1, 10, and 150 associated with it?
This is a DAX pattern to test if contains at least some values.
| Certifications |
| Certification | TrainingID |
| A | 1 |
| A | 10 |
| A | 150 |
| B | 7 |
| B | 9 |
| UserCertifications |
| UserID | Certification | Training |
| 1 | A | 1 |
| 1 | A | 10 |
| 1 | A | 300 |
| 2 | A | 150 |
| 2 | B | 9 |
| 2 | B | 90 |
| 3 | A | 7 |
| 4 | A | 1 |
| 4 | A | 10 |
| 4 | A | 150 |
| 4 | A | 1000 |
In the above scenario, DAX needs to find out if the mandatory trainings (Certifications[TrainingID]) by Certifications[Certification] is completed by
UserCertifications[UserID ]&&UserCertifications[Certifications] partition.
In the above scenario, DAX should only return true for UserCertifications[UserID ]=4 as it is the only User that completed at least all the mandatory trainings.
The way to achieve this is through the following measure
areAllMandatoryTrainingCompleted =
VAR _alreadyCompleted =
) // what is completed in the fact Table; the fourth argument is very important as it decides the sort order
VAR _0 =
MAX ( UserCertifications[Certification] )
VAR _supposedToComplete =
FILTER ( Certifications, Certifications[Certification] = _0 ),
) // what is comeleted in the training Table; the fourth argument is very important as it decides the sort order
VAR _isMandatoryTrainingCompleted =
CONTAINSSTRING ( _alreadyCompleted, _supposedToComplete ) // CONTAINSSTRING (<Within Text>,<Search Text>); return true false

How to search using a delimited string as array in query

I am trying to search for records columns that match a value within a delimited string.
I have two tables that look like this
| Id | Make | Model |
| 1 | Ford | Focus |
| 2 | Ford | GT |
| 3 | Ford | Kuga |
| 4 | Audi | R8 |
| Id | Makes | Models |
| 1 | Ford | GT,Focus |
| 2 | Audi | R8 |
What I'm trying to achieve is the following:
| Id | Makes | Models | Matched_Count |
| 2 | Audi | R8 | 1 |
| 1 | Ford | GT,Focus | 2 |
Using the following query I can get matches on singular strings, but I'm not sure how I can split the commas to search for individual models.
select Id, Makes, Models, (select count(id) from Vehicles va where UPPER(sa.Makes) = UPPER(va.Make) AND UPPER(sa.Models) = UPPER(va.Model)) as Matched_Count
from Monitor sa
(I am using a very SQL Server 2016 however I do not have access to create custom functions or variables)
If you are stuck with this data model, you can use string_split():
select m.*, v.matched_count
from monitor m outer apply
(select count(*) as matched_count
from string_split(m.models, ',') s join
vehicles v
on s.value = v.model and m.makes = v.makes
) v;
I would advise you to put your efforts into fixing the data model, though.
Here is a db<>fiddle.

SQL Group by Client Location

Sample of Data I am trying to manipulate
Order | OrderDate | ClientName| ClientAddress | City | State| Zip |
CO101 | 1/5/2015 | Client ABC| 101 Park Drive| Boston | MA | 02134 |
C0102 | 2/6/2015 | Client ABC| 101 Park Drive| Boston | MA | 02134 |
C0103 | 1/7/2015 | Client ABC| 354 Foo Pkwy | Dallas | TX | 75001 |
C0104 | 3/7/2015 | Client ABC| 354 Foo Pkwy | Dallas | TX | 75001 |
C0105 | 5/7/2015 | Client XYZ| 1 Binary Road | Austin | TX | 73301 |
C0106 | 1/8/2015 | Client XYZ| 1 Binary Road | Austin | TX | 73301 |
C0107 | 7/9/2015 | Client XYZ| 51 Testing Rd | Austin | TX | 73301 |
I have a database setup in MS-SQL Server with all client orders for the past two year period. Some clients only have one location, others have multiple locations. I would like to write a script that will show me the number of orders a customer placed by location over the total number of weeks there was at least one order.
Based on the results of this script, I would like to be able to deduce every customer location's summary of unique orders (placed at various times). For example:
Client ABC has placed 45 orders over 35 total weeks at location A
Client ABC has placed 35 orders over 15 total weeks at location B
Client ABC has placed 15 orders over 15 total weeks at location C
I would like see this information for each unique location for each client. I am not sure how to aggregate the data in such a way. Here is where I am at with my script:
SELECT t1.ClientName, (SELECT DISTINCT t2.ClientAddress), COUNT(DISTINCT t2.Orders) AS TotalOrders,
DATEPART(week, t1.OrderDate) AS Week
FROM database t1
INNER JOIN database t2 on t1.Orders = t2.Orders
GROUP BY DATEPART(week, t1.OrderDate), t1.ClientAddress, t2.ClientAddress
The results that I get show me the unique orders by location by week, but I'm not sure how to count the number of weeks in the way that I need; I have tried writing subqueries but I keep running into issues. I realize that in this script I am showing number of order by location by each individual week, I would like to count the total number of weeks within the time frame of where there is at least one order.
The results structure is as followed:
| ClientName| ClientAddress | TotalOrders | Week |
|Client ABC |101 Park Drive | 30 | 21 |
|Client ABC |101 Park Drive | 29 | 13 |
|Client ABC |101 Park Drive | 28 | 10 |
|Client XYZ |1 Binary Road | 27 | 19 |
|Client XYZ |1 Binary Road | 25 | 7 |
|Client XYZ |51 Testing Rd | 22 | 9 |
Any and all help would be greatly appreciated; thank you in advance.
Isn't this what you want?
SELECT t1.ClientName, ClientAddress, COUNT(DISTINCT t1.Orders) AS TotalOrders,
COUNT(DISTINCT DATEPART(week, t1.OrderDate)) AS Weeks
FROM database t1
GROUP BY t1.ClientName, t1.ClientAddress
I don't really follow why you're doing a self-join. Seems useless to me, but I left it in, just in case, and to focus only on the change I made to get your result.

Can I merge SQL Server tables if they have not exactly the same structure?

I have two tables, source and target.
| Name | Year | City |
| Toyota | 2002 | Los Angeles |
| Seat | 2012 | Madrid |
| ID | Name | Year | City |
| 1 | Bentley | 1969 | Budapest |
| 2 | Toyota | 1988 | New York |
| 3 | Ford | 2001 | Tokyo |
| 4 | Seat | 1995 | Madrid |
| 5 | Bugatti | 1995 | London |
I want to merge source into target. I know the MERGE command, it's fine. The issue is that the source has no column ID so that it won't match.
Since Name column in both are unique I only need to match if they are equal, then if not exists insert into target, if exists update target.
I could do it using NOT EXIST statement, but we are talking about billions of rows so MERGE would be a much quicker solution.
So can I somehow set the MERGE command to take only that column into account when matching?
Yes, you can:
MERGE target t
USING source s
ON =
INSERT (Name, Year, City)
VALUES (s.Name, s.Year, s.City)
UPDATE SET Year = s.Year,
City = s.City;
If your ID column in target is not IDENTITY column you can create sequence to populate it.

SQL query for many-to-many self-join

I have a database table that has a companion many-to-many self-join table alongside it. The primary table is part and the other table is alternate_part (basically, alternate parts are identical to their main part with different #s). Every record in the alternate_part table is also in the part table. To illustrate:
| part_id | part_number | description |
| 1 | 00001 | wheel |
| 2 | 00002 | tire |
| 3 | 00003 | window |
| 4 | 00004 | seat |
| 5 | 00005 | wheel |
| 6 | 00006 | tire |
| 7 | 00007 | window |
| 8 | 00008 | seat |
| 9 | 00009 | wheel |
| 10 | 00010 | tire |
| 11 | 00011 | window |
| 12 | 00012 | seat |
| main_part_id | alt_part_id |
| 1 | 5 | // Wheel
| 5 | 1 | // |
| 5 | 9 | // |
| 9 | 5 | // |
| 2 | 6 | // Tire
| 6 | 2 | // |
| ... | ... | // |
I am trying to produce a simple SQL query that will give me a list of all alternates for a main part. The tricky part is: some alternates are only listed as alternates of alternates, it is not guaranteed that every viable alternate for a part is listed as a direct alternate. e.g., if 'Part 3' is an alternate of 'Part 2' which is an alternate of 'Part 1', then Part 3 is an alternate of Part 1 (even if the alternate_part table doesn't list a direct link). The reverse is also true (Part 1 is an alternate of Part 3).
Basically, right now I'm pulling alternates and iterating through them
SELECT p.*, ap.*
FROM part p
INNER JOIN alternate_part ap ON p.part_id = ap.main_part_id
And then going back and doing the same again on those alternates. But, I think there's got to be a better way.
The SQL query I'm looking for will basically give me:
| part_id | alt_part_id |
| 1 | 5 |
| 1 | 9 |
For part_id = 1, even when 1 & 9 are not explicitly linked in the alternates table.
Note: I have no control whatever over the structure of the DB, it is a distributed software solution.
Note 2: It is an Oracle platform, if that affects syntax.
You have to create hierarchical tree , probably you have to use connect by prior , nocycle query
something like this
select distinct p.part_id,p.part_number,p.description,c.main_part_id
from part p
left join (
select main_part_id,connect_by_root(main_part_id) real_part_id
from alternate_part
connect by NOCYCLE prior main_part_id = alternate_part_id
) c
on p.part_id = c.real_part_id and p.part_id != c.main_part_id
order by p.part_id
You can read full documentation about Hierarchical queries at