get X number of non null columns as single string - sql

I am trying to find the SQL command to do something but I don't know how to explain it so I'll use an example. I have a table like so:
| one | two | three | four |
|-----|-----|-------|------|
| a | h | i | j |
| b | k | l | |
| c | m | n | o |
| d | p | | |
| e | q | | |
| f | r | s | |
| g | t | | |
I need to create new columns that take the first non-null column from the right and kind of reverse it going up and joining/concatenating the fields.
| one | 1-up | 2-up | 3-up |
|-----|------|------|---------|
| a | j | j, i | j, i, h |
| b | l | l, k | |
| c | o | o, n | o, n, m |
| d | p | | |
| e | q | | |
| f | s | s, r | |
| g | t | | |
For b, since column four doesn't have data it uses three as the first value. Same for the other rows.
I hope this makes sense. I'm not sure how else to explain this.

You can use COALESCE like this :
select one, COALESCE(four,three,two,'') as '1-up',
COALESCE(four+','+three,three+','+two,'') as '2-up',
COALESCE(four+','+three+','+two,'') as '3-up'
from Table1
SQL Fiddle link Here

Related

Pandas Pivot-Table Containing List

I'd like to create a pivot table with the counts of values in a list, filtered by another column but am not sure how to use pandas pivot table (or function) with a list.
Here's an example what I'd like to do:
| Col1 | Col2 |
| --- | ----------- |
| A | ["e", "f"] |
| B | ["g", "f"] |
| C | ["g", "h"] |
| A | ["e", "g"] |
| B | ["g", "f"] |
| C | ["g", "e"] |
Ideal Pivot Table
| 1 | 2 |count|
| A | e | 2 |
| | f | 1 |
| | g | 1 |
| B | g | 2 |
| | f | 2 |
| C | g | 2 |
| | h | 1 |
| | e | 1 |
I cannot use a list to make a pivot table and am struggling to figure out how to modify the data or find a different method. Any help would be much appreciated!
Try this:
cols = ['Col1','Col2']
df.explode('Col2').groupby(cols).size()

Transform data in rows to columns

I'm trying to transform data of the following form:
| ID | X | Y |
--------------
| 1 | a | m |
| 1 | b | n |
| 1 | c | o |
| 2 | d | p |
| 2 | e | q |
| 3 | f | r |
| 3 | g | s |
| 3 | h | |
To this form:
| ID | X1 | X2 | X3 | Y1 | Y2 | Y3 |
------------------------------------
| 1 | a | b | c | m | n | o |
| 2 | d | e | | p | q | |
| 3 | f | g | h | r | s | |
What is the best way to accomplish this in SQL Server 2017? Is there a better way to do transformations like this using another tool?
I don't think you can solve this problem on the DB side. You should do some backend programming. You would be able to use Pivot function, if you wanted to reverse your row values as column but you want to group them based on duplicate ids. I would solve this problem by checking duplicates by using the query below. At the results of that query, you'll be able to get max count for duplicated id. For example 1 duplicated 3 times, so you need to create a data table with 3x2+1=7 columns in your backend code. 1 stands for id column. After that you can just fill that table by checking data for each id.
WITH Temp (id, count)
AS
(
Select id, count(*)
from MyTable
group by id
having count(*)>1
)
select max(count) from Temp

Creating a relational database from N dimensional table

I have a bunch of functions that have many input arguments but relatively few unique outputs, and I'm trying to tabulate the outputs so I can look up against the inputs. Typically I will have a set of 2D tables for a given function, with some rules which determine which table to look at. So for example, if the value of some variable is less than 1, I should look at table 1, but if the variable is greater than or equal to 1 I should look at table 2 to determine the output (but more typically the table to use would depend on the value of 5 or 6 variables). There is an additional problem - table 2 and table 1 don't necessarily have the same rows & columns.
My solution so far has been to create flat file tables containing all of the information for all the of the tables (so I end up with one big complicated table) and the values in the tables are represented in one single column at the end of the table (with the inputs being represented in all of the columns before that). Because not all of the tables are the same, that means I have a lot of zeros or blanks (where a table doesn't apply). This solution is working for my purposes but as I try to tabulate more functions the table gets more and more complex, and it's fairly essential that somebody is able to easily check and verify the table values. I was hoping that there would be a clearer way of presenting the information using a relational database, but I'm not sure whether this would make things more or less clear. Using my sorted flat file has made it reasonably clear so far but as the tables get more complex it's become more and more difficult for humans to read, and problems are beginning to emerge.
I've looked into database design principles but all of the introductory material I have found has used much simpler examples than mine, and I'm not sure how to extend what I've read to meet my needs. My understanding is that if I have N inputs to my function then I'll need N+1 tables to create a relational database; so I'm not sure if that will make things clearer or less clear. I'm hoping this is a common problem for database experts and that someone will have some advice!
Edit:
An example has been requested, so I've made one up which is similar to my problem. Suppose I want to work out what the price of clothing was on a given date. I have three 2D tables which give me that information:
Before 1/1/2010:
+--------+----------+----------+----------+----------+
| | Fabric A | Fabric B | Fabric C | Fabric D |
+--------+----------+----------+----------+----------+
| Size S | 1 | 2 | 3 | 4 |
| Size M | 5 | 6 | 7 | 8 |
| Size L | 9 | 10 | 11 | 12 |
+--------+----------+----------+----------+----------+
After 1/1/2010:
If Designer is P:
+--------+------------+------------+------------+
| | Location X | Location Y | Location Z |
+--------+------------+------------+------------+
| Size S | 1 | 2 | 3 |
| Size M | 5 | 6 | 7 |
| Size L | 9 | 10 | 11 |
+--------+------------+------------+------------+
If Designer is Q:
+--------+------------+------------+------------+
| | Location X | Location Y | Location Z |
+--------+------------+------------+------------+
| Size S | 2 | 2 | 3 |
| Size M | 4 | 5 | 5 |
| Size L | 6 | 7 | 8 |
+--------+------------+------------+------------+
So that's the three tables. What I've done is created a table something like this:
+------------+----------+------+--------+----------+-------+
| Date | Designer | Size | Fabric | Location | Price |
+------------+----------+------+--------+----------+-------+
| 01/01/2010 | P | L | A | X | 6 |
| 01/01/2010 | P | L | A | Y | 7 |
| 01/01/2010 | P | L | A | Z | 8 |
| 01/01/2010 | P | L | B | X | 6 |
| 01/01/2010 | P | L | B | Y | 7 |
| 01/01/2010 | P | L | B | Z | 8 |
| 01/01/2010 | P | L | C | X | 6 |
| 01/01/2010 | P | L | C | Y | 7 |
| 01/01/2010 | P | L | C | Z | 8 |
| 01/01/2010 | P | L | D | X | 6 |
| 01/01/2010 | P | L | D | Y | 7 |
| 01/01/2010 | P | L | D | Z | 8 |
| 01/01/2010 | P | M | A | X | 4 |
| 01/01/2010 | P | M | A | Y | 5 |
| 01/01/2010 | P | M | A | Z | 5 |
| 01/01/2010 | P | M | B | X | 4 |
| 01/01/2010 | P | M | B | Y | 5 |
| 01/01/2010 | P | M | B | Z | 5 |
| 01/01/2010 | P | M | C | X | 4 |
| 01/01/2010 | P | M | C | Y | 5 |
| 01/01/2010 | P | M | C | Z | 5 |
| 01/01/2010 | P | M | D | X | 4 |
| 01/01/2010 | P | M | D | Y | 5 |
| 01/01/2010 | P | M | D | Z | 5 |
| 01/01/2010 | P | S | A | X | 2 |
| 01/01/2010 | P | S | A | Y | 2 |
| 01/01/2010 | P | S | A | Z | 3 |
| 01/01/2010 | P | S | B | X | 2 |
| 01/01/2010 | P | S | B | Y | 2 |
| 01/01/2010 | P | S | B | Z | 3 |
| 01/01/2010 | P | S | C | X | 2 |
| 01/01/2010 | P | S | C | Y | 2 |
| 01/01/2010 | P | S | C | Z | 3 |
| 01/01/2010 | P | S | D | X | 2 |
| 01/01/2010 | P | S | D | Y | 2 |
| 01/01/2010 | P | S | D | Z | 3 |
| 01/01/2010 | Q | L | A | X | 9 |
| 01/01/2010 | Q | L | A | Y | 10 |
| 01/01/2010 | Q | L | A | Z | 11 |
| 01/01/2010 | Q | L | B | X | 9 |
| 01/01/2010 | Q | L | B | Y | 10 |
| 01/01/2010 | Q | L | B | Z | 11 |
| 01/01/2010 | Q | L | C | X | 9 |
| 01/01/2010 | Q | L | C | Y | 10 |
| 01/01/2010 | Q | L | C | Z | 11 |
| 01/01/2010 | Q | L | D | X | 9 |
| 01/01/2010 | Q | L | D | Y | 10 |
| 01/01/2010 | Q | L | D | Z | 11 |
| 01/01/2010 | Q | M | A | X | 5 |
| 01/01/2010 | Q | M | A | Y | 6 |
| 01/01/2010 | Q | M | A | Z | 7 |
| 01/01/2010 | Q | M | B | X | 5 |
| 01/01/2010 | Q | M | B | Y | 6 |
| 01/01/2010 | Q | M | B | Z | 7 |
| 01/01/2010 | Q | M | C | X | 5 |
| 01/01/2010 | Q | M | C | Y | 6 |
| 01/01/2010 | Q | M | C | Z | 7 |
| 01/01/2010 | Q | M | D | X | 5 |
| 01/01/2010 | Q | M | D | Y | 6 |
| 01/01/2010 | Q | M | D | Z | 7 |
| 01/01/2010 | Q | S | A | X | 1 |
| 01/01/2010 | Q | S | A | Y | 2 |
| 01/01/2010 | Q | S | A | Z | 3 |
| 01/01/2010 | Q | S | B | X | 1 |
| 01/01/2010 | Q | S | B | Y | 2 |
| 01/01/2010 | Q | S | B | Z | 3 |
| 01/01/2010 | Q | S | C | X | 1 |
| 01/01/2010 | Q | S | C | Y | 2 |
| 01/01/2010 | Q | S | C | Z | 3 |
| 01/01/2010 | Q | S | D | X | 1 |
| 01/01/2010 | Q | S | D | Y | 2 |
| 01/01/2010 | Q | S | D | Z | 3 |
| 01/01/1900 | P | L | A | X | 9 |
| 01/01/1900 | P | L | A | Y | 9 |
| 01/01/1900 | P | L | A | Z | 9 |
| 01/01/1900 | P | L | B | X | 10 |
| 01/01/1900 | P | L | B | Y | 10 |
| 01/01/1900 | P | L | B | Z | 10 |
| 01/01/1900 | P | L | C | X | 11 |
| 01/01/1900 | P | L | C | Y | 11 |
| 01/01/1900 | P | L | C | Z | 11 |
| 01/01/1900 | P | L | D | X | 12 |
| 01/01/1900 | P | L | D | Y | 12 |
| 01/01/1900 | P | L | D | Z | 12 |
| 01/01/1900 | P | M | A | X | 5 |
| 01/01/1900 | P | M | A | Y | 5 |
| 01/01/1900 | P | M | A | Z | 5 |
| 01/01/1900 | P | M | B | X | 6 |
| 01/01/1900 | P | M | B | Y | 6 |
| 01/01/1900 | P | M | B | Z | 6 |
| 01/01/1900 | P | M | C | X | 7 |
| 01/01/1900 | P | M | C | Y | 7 |
| 01/01/1900 | P | M | C | Z | 7 |
| 01/01/1900 | P | M | D | X | 8 |
| 01/01/1900 | P | M | D | Y | 8 |
| 01/01/1900 | P | M | D | Z | 8 |
| 01/01/1900 | P | S | A | X | 1 |
| 01/01/1900 | P | S | A | Y | 1 |
| 01/01/1900 | P | S | A | Z | 1 |
| 01/01/1900 | P | S | B | X | 2 |
| 01/01/1900 | P | S | B | Y | 2 |
| 01/01/1900 | P | S | B | Z | 2 |
| 01/01/1900 | P | S | C | X | 3 |
| 01/01/1900 | P | S | C | Y | 3 |
| 01/01/1900 | P | S | C | Z | 3 |
| 01/01/1900 | P | S | D | X | 4 |
| 01/01/1900 | P | S | D | Y | 4 |
| 01/01/1900 | P | S | D | Z | 4 |
| 01/01/1900 | Q | L | A | X | 9 |
| 01/01/1900 | Q | L | A | Y | 9 |
| 01/01/1900 | Q | L | A | Z | 9 |
| 01/01/1900 | Q | L | B | X | 10 |
| 01/01/1900 | Q | L | B | Y | 10 |
| 01/01/1900 | Q | L | B | Z | 10 |
| 01/01/1900 | Q | L | C | X | 11 |
| 01/01/1900 | Q | L | C | Y | 11 |
| 01/01/1900 | Q | L | C | Z | 11 |
| 01/01/1900 | Q | L | D | X | 12 |
| 01/01/1900 | Q | L | D | Y | 12 |
| 01/01/1900 | Q | L | D | Z | 12 |
| 01/01/1900 | Q | M | A | X | 5 |
| 01/01/1900 | Q | M | A | Y | 5 |
| 01/01/1900 | Q | M | A | Z | 5 |
| 01/01/1900 | Q | M | B | X | 6 |
| 01/01/1900 | Q | M | B | Y | 6 |
| 01/01/1900 | Q | M | B | Z | 6 |
| 01/01/1900 | Q | M | C | X | 7 |
| 01/01/1900 | Q | M | C | Y | 7 |
| 01/01/1900 | Q | M | C | Z | 7 |
| 01/01/1900 | Q | M | D | X | 8 |
| 01/01/1900 | Q | M | D | Y | 8 |
| 01/01/1900 | Q | M | D | Z | 8 |
| 01/01/1900 | Q | S | A | X | 1 |
| 01/01/1900 | Q | S | A | Y | 1 |
| 01/01/1900 | Q | S | A | Z | 1 |
| 01/01/1900 | Q | S | B | X | 2 |
| 01/01/1900 | Q | S | B | Y | 2 |
| 01/01/1900 | Q | S | B | Z | 2 |
| 01/01/1900 | Q | S | C | X | 3 |
| 01/01/1900 | Q | S | C | Y | 3 |
| 01/01/1900 | Q | S | C | Z | 3 |
| 01/01/1900 | Q | S | D | X | 4 |
| 01/01/1900 | Q | S | D | Y | 4 |
| 01/01/1900 | Q | S | D | Z | 4 |
+------------+----------+------+--------+----------+-------+
I can't use the little tables for my purposes (I don't think), but I can easily use the big one. However, this introduces a secondary requirement: other people now need to be able to routinely check that the big table completely contains all of the information from the little tables. It's not that hard for somebody to check if a given price from the big table is consistent with a price from the appropriate little table, but as we add more tables and more parameters become involved it becomes very difficult to spot other problems (for example, a missing entry). The question that will need to be answered is "can the big table be used to correctly look up all of the possible prices for an item of clothing?".
My current thinking is that I'd like to set up small tables which are very easy to check, and perhaps automate the process of generating the big table so that confidence in the process => confidence in the big table. I also wonder whether the big table is even necessary for me to do the lookup that I want to be able to do, or if there is a smart way to go and fetch the outputs from the little tables directly (using a clever database design, perhaps?).
Perhaps this is just a hard problem, but I was wondering if there is a solution that is clearly better than others.
Edit:
Thanks for all the comments so far. Unfortunately I'm struggling to understand a couple of the answers and it seems like the problem is still too vague to answer properly. I'll just try to flesh out my problem in the context of the example already given.
In my clothing example above, I am saying that the price of an item of clothing, in general, is a function of date sold, location sold, designer, size, fabric. I have created a table to express a relationship between these inputs and the price.
However, many of the rows in this table don't benefit from all of the columns - for anything sold before 1/1/2010 there is no dependency on designer, for example. To encode that in my large table, I have had to add a lot of extra rows to ensure that for anything sold before 1/1/2010 gives the same answer for designer P as designer Q for any other combination of inputs. This seems inefficient, but I'm not sure how (or whether) this could be formulated better. I've tried to understand the process of normalisation, but I'm struggling to see how that would work for this clothing example - and further to that, I'm not sure whether normalisation would make the table clearer (as that is my main goal now).
As an additional business constraint, I could receive more information at any time about prices, or new methods for pricing. So it's entirely conceivable that I may have to add a couple more inputs (columns) to my large table, and whole load more rows to capture how the price now depends on every input, even if that only results in one or two distinct new price points. In my example I have three small tables, but in reality I will have hundreds. That's a lot of information and there's no way around that, but I'm sure there is a way of doing this that doesn't require either hundreds of small tables or many thousands of essentially redundant rows in a big table. I'm just struggling to see how I would break it down. Is there a way of breaking it down which is clearer to the human eye? Is there a way of formulating it that means that when a new input turns up, I don't have to add 1000 redundant rows to my table? Is normalisation what I need to do?
I haven't fully appreciated the answers and resources that have already been given so I'll continue to try to digest those, but hopefully this edit resolves some of the ambiguity about what I'm asking.
TL;DR Your text tables are pictures of relations & functions. You need to forget the formatting and "2D" and "flat" and just determine the dimensions whose values are related per the application relationship that a table represents. If you want to display a relation in a certain format, that's a user graphic interface issue.
You need to read and learn about the relational model, information modeling and normalization.
The canonical picture (but it's just a picture) when using relational databases is to put all the dimensions of an application relationship along the top.
Price is the price for size Size with fabric Fabric after 1/1/2010 designed by P
+------+--------+-------+
| Size | Fabric | Price |
+------+--------+-------+
| S | A | 1 |
| S | B | 2 |
...
| L | D | 12 |
+------+--------+-------+
But wait! The table above is part of some table like:
Price is the price for size Size with fabric Fabric after After designed by Designer AND After = 1/1/2010 AND Designer = P
+------+--------+------+----------+----------+
| Size | Fabric | Price| Designer | After |
+------+--------+------+----------+----------+
| S | A | 1 | P | 1/1/2010 |
| S | A | 1 | P | 1/1/2010 |
...
| L | D | 12 | P | 1/1/2010 |
+------+--------+------+----------+----------+
which is part of some table like:
Price is the price for size Size with fabric Fabric after After designed by Designer
+------+--------+-------+----------+----------+
| Size | Fabric | Price | Designer | After |
+------+--------+-------+----------+----------+
| S | A | 1 | P | 1/1/2010 |
...
| M | D | 8 | Q | 1/1/1900 |
...
+------+--------+-------+----------+----------+
On the other hand if that table's label always meant "Price is the price for size Size with fabric Fabric AND Price is the price after After designed by Designer" and certain other things are so then we would only need the first table and this one:
Price is the price after After designed by Designer
+-------+----------+----------+
| Price | Designer | After |
+-------+----------+----------+
| 1 | P | 1/1/2010 |
...
| 8 | Q | 1/1/1900 |
...
+-------+----------+----------+
because of being able to reconstruct the larger one from the smaller ones.
A relation holds the rows that make some predicate aka parameterized statement about the business situation into a true statement. The join of two relations holds the rows that make the "AND" of their predicates into a true statement. (And union the "OR", etc.) Database design is about finding necessary and sufficient predicates to describe any business situation. We describe predicates/tables we want to see as queries in terms of those base predicates/tables. We try to simplify predicates/tables. We try to minimize predicates/tables being expressible in terms of others (in whole or part). We can format in various ways on output. We also determine constraints that describe valid database states (equivalently, application situations that can arise) so the DBMS can reject invalid update attempts.
When designing, once we have some predicates/tables that completely describe our business situation we make predicates/tables smaller and independent where possible via normalization which is based on functional dependencies and join dependencies. Information modeling methods generally try to produce base predicates/tables that are already at least somewhat normalized.
Functional dependencies involve predicates/tables where columns are functions of other columns. Join dependencies involve predicates/tables using "AND". (Every functional dependency comes with a join dependency.) Normalization involves replacing predicates/tables by smaller ones to get rid of problematic functional and join dependencies.
As an additional business constraint, I could receive more information at any time about prices, or new methods for pricing. So it's entirely conceivable that I may have to add a couple more inputs (columns) to my large table, and whole load more rows to capture how the price now depends on every input, even if that only results in one or two distinct new price points. In my example I have three small tables, but in reality I will have hundreds
I think that this is a red flag, telling you that a purely relational storage mechanism is not going to work very well for you.
To be honest, I'm not sure what the appropriate mechanism is, but it seems that you need to be able to pass a product and a date into a function that then uses business rules to determine the appropriate price, and hence I'm thinking you need something like a business rules engine.

Determine/obtain/find management chain (hierarchy) for employees using SQL

I have a table (in an MS Access DB) with every employee at a company. The table also has a column to indicate each employee's manager. There are columns (with no data) to indicate that employee's full management chain to the CEO. I need to determine/obtain/find and fill this data using SQL (and/or VBA).
I know what needs to be done but I'm drawing a complete blank on how to do it efficiently.
I know I could go row by row but that seems so inefficient. There has to be a better way.
For example, take this table below:
+------------+-----------+----------+----------+----------+
| employeeID | managerID | manager1 | manager2 | manager3 |
+------------+-----------+----------+----------+----------+
| a | | | | |
| b | a | | | |
| c | a | | | |
| d | a | | | |
| e | b | | | |
| f | b | | | |
| g | c | | | |
| h | c | | | |
| i | d | | | |
| j | e | | | |
| k | f | | | |
| l | g | | | |
+------------+-----------+----------+----------+----------+
a has no manager (CEO)
b's manager is a
l's manager is g whose manager is c whose manager is a
etc...
So this would result in the table:
+------------+-----------+----------+----------+----------+
| employeeID | managerID | manager1 | manager2 | manager3 |
+------------+-----------+----------+----------+----------+
| a | | | | |
| b | a | a | | |
| c | a | a | | |
| d | a | a | | |
| e | b | a | b | |
| f | b | a | b | |
| g | c | a | c | |
| h | c | a | c | |
| i | d | a | d | |
| j | e | a | b | e |
| k | f | a | b | f |
| l | g | a | c | g |
+------------+-----------+----------+----------+----------+

Map column data to matching rows

I have a sheet like this:
| A | B | C | D | E | F | G | H | ...
---------------------------------
| a | 1 | | b | 2 | | c | 7 |
---------------------------------
| b | 2 | | c | 8 | | b | 4 |
---------------------------------
| c |289| | a | 3 | | a |118|
---------------------------------
| d | 6 | | e | 3 | | e |888|
---------------------------------
| e | 8 | | d |111| | d |553|
---------------------------------
I want the sheet to become like this:
| A | B | C | D | E | F | G | H | ...
---------------------------------
| a | 1 | 3 |118| | | | |
---------------------------------
| b | 2 | 2 | 4 | | | | |
---------------------------------
| c |289| 8 | 7 | | | | |
---------------------------------
| d | 6 |111|553| | | | |
---------------------------------
| e | 8 | 3 |888| | | | |
---------------------------------
Col A, Col B and Col G have letters which are unique, and in the col next to it it has weights.
To make it even more clear,
| A | B |
---------
| a | 1 |
---------
| b | 2 |
---------
| c |289|
...
are the weights of a,b,c... in January
Similarly | D | E | are weights of a,b,c... in July and | G | H | are weights of a,b,c... in December
I need to put them side-by-side for comparison, the thing is they are NOT in order.
How do I approach this?
UPDATE
There are thousands of a,b,c, aa, bb, cc, aaa, avb, as, saf, sfa etc.. and some of them MAY be present in January (Col A) and not in July (Col D)
Something like this
code
Sub Squeeze()
[c1:c5] = Application.Index([E1:E5], Evaluate("IF(A1:A5<>"""",MATCH(A1:A5,D1:D5,0),A1:A5)"), 1)
[d1:d5] = Application.Index([H1:h5], Evaluate("IF(A1:A5<>"""",MATCH(A1:A5,G1:G5,0),A1:A5)"), 1)
[e1:h5].ClearContents
End Sub
Explanation of first line
Application.Index([E1:E5], Evaluate("IF(A1:A5<>"""",MATCH(A1:A5,D1:D5,0),A1:A5)"), 1)
The MATCH returns a VBA array matching the positions (5) of A1:A5 against D1:D5
INDEX then returns the corresponding values from E1:E5
So to use the key column of A1:A100 against M1:100 with values in N1:100
Application.Index([N1:N100], Evaluate("IF(A1:A100<>"""",MATCH(A1:A100,M1:M100,0),A1:A100)"), 1)
Extend as necessary: Sort D:E by D ascending, sort G:H by G ascending, delete G,F,D,C. If you want VBA, do this with Record Macro selected.