Is It A Good Idea or A Huge Mistake to Combine More Than 1 Type of Data Into A Single Column in An SQL Database Table? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
So, let's say I have 5 items, A, B, C, D and E. Item A comes in sizes 1 and 2, item B comes in sizes 2 and 3, C comes in 1 and 3, D comes in 1 and E comes in 3. Now, I am considering 2 table options, as follow:
Table 1
Name
Size
A
1
A
2
B
2
B
3
C
1
C
3
D
1
E
3
Another option is Table 2, as follows:
Name
A1
A2
B2
B3
C1
C3
D1
E3
Now, which of these 2 tables is actually a better option? What are the advantages and disadvantages (if any) of each of the 2 tables above? One thing that I can think of is that, if I use table 1, I can easily extract all items by size, no matter what item I want. So, for instance, if I want to analyze this month's sales of items of size 1, it's easy to do it with Table 1. I can't seem to see the same advantage if I use table 2. What do you guys think? Please kindly enlighten me on this matter. Thank you in advance for your kind assistance, everyone. Cheers! :)

I don't even understand why you have the second table option - what purpose does it have or how does it help you? Plain and simple you have a one to many relationship. That is an item comes in 1 or more different sizes. You just saying that sentence should scream ONLY option 1. Option 2 will make your life a living hell because you are going against normalization guidelines by taking 2 datatypes into 1, and it has no real benefit.
Option 1 says I have an item and it can have one or more sizes associated with it.
Item Size
A 1
A 2
A 3
B 1
C 1
C 2
Then you can do simple queries like give me all items that have more then 1 size. Give me any item that only has 1 size. Give me all the sizes of item with item id A, etc.

Related

How to scrape a pivot table with Beautiful Soup [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 10 days ago.
Improve this question
I'm trying to scrape a complex Wikipedia table (I'm not sure if it's appropriate to generalize such tables with the term "pivot table") using Beautiful Soup in hopes of recreating a simpler, more-analyzable version of it in Pandas.
JLPT "Applications and results" table on English Wikipedia
As an overview, moving from the left side: the table lists the years when JLPT was held, which exam levels were open that year, and then the statistics defined by the columns on top. The aggregated columns don't really matter for my purposes, although it'd be nice if there's a way to scrape and reconstruct them as such.
What makes the table difficult to reconstruct is that it has grouped rows (the years under column 'Year'), but the rows of that year are placed in the same hierarchical level as the year header, not under. Further, instead of having a <th> tag of the year in each <tr> row, it's only present in the first row of the year group:
HTML structure of the table
Another problem is that the year headers do not have any sort of defining identifiers in their tags or attributes, so I also can't pick only the rows with years in it.
These things make it impossible to group the rows by year.
So far, the only way I've been able to reconstruct some of the table is by:
scraping the entire table,
appending every <tr> element into a list,
since every year has a citation in square brackets: deleting every instance of strings with a [ in it, resulting in a uniform length of elements in every row
converting them into a pandas dataframe (manually adding column names, removing leftover HTML using regex, etc.), without the years:
Row elements in a list
Processed dataframe (minus the years)
After coming this far, now I realize that it's still difficult to group the rows by years without doing so manually. I'm wondering if there's a simpler, more straightforward way of scraping similarly complex tables with only BeautifulSoup itself, and little to no postprocessing in pandas. In this case, it's okay if it's not possible to get the table in its original pivot format, I just want to have the year value for each row. Something like:
Dataframe goal
You do not need to use BeautifulSoup to do this. Instead, you can use pd.read_html directly to get what you need. When you read the HTML from Wikipedia, it will pull in all of the tables into a list. If you scan through the list, you will see that it is the 10th dataframe.
df = pd.read_html('https://en.wikipedia.org/wiki/Japanese-Language_Proficiency_Test')[10]
From there, you'll do some data cleaning to create the table that you need.
# Convert multi-level column into single columns
df.columns = df.columns.map('_'.join)
#Fix column names
df = df.rename({'Year_Year': 'dummy_year',
'Level_Level': 'level',
'JLPT in Japan_Applicants': 'japan_applicants',
'JLPT in Japan_Examinees': 'japan_examinees',
'JLPT in Japan_Certified (%)': 'japan_certified',
'JLPT overseas_Applicants': 'overseas_applicants',
'JLPT overseas_Examinees': 'overseas_examinees',
'JLPT overseas_Certified (%)': 'overseas_certified'},
axis=1)
# Remove text in [], (). Remove commas. Convert to int.
df['japan_certified'] = df['japan_certified'].str.replace(r'\([^)]*\)', '', regex=True).str.replace(',', '').astype(int)
df['overseas_certified'] = df['overseas_certified'].str.replace(r'\([^)]*\)', '', regex=True).str.replace(',', '').astype(int)
df['dummy_year'] = df['dummy_year'].str.replace(r'\[.*?\]', '', regex=True)
Output:
dummy_year level ... overseas_examinees overseas_certified
0 2007 1 kyū ... 110937 28550
1 2007 2 kyū ... 152198 40975
2 2007 3 kyū ... 113526 53806
3 2007 4 kyū ... 53476 27767
4 2008 1 kyū ... 116271 38988
.. ... ... ... ... ...
127 2022-1 N1 ... 49223 17282
128 2022-1 N2 ... 54542 25677
129 2022-1 N3 ... 41264 21058
130 2022-1 N4 ... 40120 19389
131 2022-1 N5 ... 30203 16132

Minimum number if Common Items in 2 Dynamic Stacks

I have a verbal algorithm question, thus I have no code yet. The question is this: How can I possibly create an algorithm such that I have 2 dynamic stacks, both can or can not have duplicate items of strings, for example I have 3 breads, 4 lemons and 2 pens in the first stack, say s1, and I have 5 breads, 3 lemons and 5 pens in the second stack, say s2. I want to find the number of duplicates in each stack, and print out the minimum number of duplicates in both lists, for example:
bread --> 3
lemon --> 3
pen --> 2
How can I traverse 2 stacks and print the number of duplicated occurrences until the end of stacks? If you are confused about anything, I can edit my question depending on your confusion. Thanks.

Reordering rows in sql database - idea

I was thinking about simple reordering rows in relational database's table.
I would like to avoid method described here:
How can I reorder rows in sql database
My simple idea was to use as ListOrder column of type double-precision 64-bit IEEE 754 floating point.
At inserting a row between two existing rows we calculate listOrder value as average of these sibling elements.
Example:
1. Starting state:
value, listOrder
a 1
b 2
c 3
d 4
e 5
f 6
2. Moving "e" two rows up
One simple sql update on e-row: update mytable set listorder=2.5 where value='e'
value, listOrder
a 1
b 2
e 2.5
c 3
d 4
f 6
3. Moving "a" one position down
value, listOrder
b 2
a 2.25
e 2.5
c 3
d 4
f 6
I have a question. How many insertions can I perform (in the edge situation) to have properly ordered list.
For the 64 bit integer there is less than 64 insertions in the same place.
Is floating point types allows to more insertions?
There are other problems with described approach?
Do you see any patches/adjustments to make this idea safe and usable in applications?
This is similar to a lexical order, which can also be done with varchar columns:
A
B
C
D
E
F
becomes
A
B
BM
C
D
F
becomes
B
BF
BM
C
D
F
I prefer the two step process, where you update every row in the table after the one you move to be one larger. Sql is efficient about this, where updating the rows following a change is not as bad as it seems. You preserve something that's more human readable, the storage size for your ordinal value scales in a linear rather with your data size, and you don't risk coming to a point where you don't have enough precision to put an item in between two values

Karnaugh map group sizes

Full disclosure, this is for an assignment I don't think I'm looking for spoon feeding, more so just a general question. Am a I allowed to break that into a group of 8 and 2 groups of 4, or do all group sizes have to be equal, ie 4 groups of 4
1 0 1 1
0 0 0 0
1 1 1 1
1 1 1 1
Sorry if this is obvious, but my searches haven't been explicit and my teacher was quite vague. Thanks!
TL;DR: Groups don't have to be equal in size.
Let see what happens if, in your case, you take 11 groups of one. Then you will have an equation of eleven terms. (ie. case_1 or case_2 or... case_11).
By making big group, in your case 1 group of 8 and 2 groups of 4, you will have a very short and simplified equation like: case_group_8 or case_group_4_1 or case_group_4_2.
Both grouping are correct (we took all the one in the map) but the second is the most optimized. (i.e. you cannot simplified more)
Making 4 groups of 4 will bring you an equation that can be simplified more.
The best way now is for you to try both grouping (all 4 vs 8/4/4) and see the output result.

How do I insert data with SQLite?

Total newbie here, regarding sqlite, so don't flame too hard :)
I have a table:
index name length L breadth B height H
1 M-1234 10 5 2
2 M-2345 20 10 3
3 ....
How do I put some tabular data (let' say ten x,y values) corresponding to index 1, then another table to index 2, and then another, etc. In short, so that I have a table of x and y values that is "connected" to first row, then another that is connected to second row.
I'm reading some tutorials on sqlite3 (which I'm using), but am having trouble finding this. If anyone knows a good newbie tutorial or a book dealing with sqlite3 (CLI) I'm all ears for that too :)
You are just looking for information on joins and the concept of foreign key, that although SQLite3 doesn't enforce, is what you need. You can go without it, anyway.
In your situation you can either add two "columns" to your table, being one x and another y, or create a new table with 3 "columns": foreign_index, x and y. Which one to use depends on what you are trying to accomplish, performance and maintainability.
If you go the linked table route, you'd end up with two tables, like this:
MyTable
index name length L breadth B height H
1 M-1234 10 5 2
2 M-2345 20 10 3
3 ....
XandY
foreign_index x y
1 12 9
2 8 7
3 ...
When you want the x and y values of your element, you just use something like SELECT x, y FROM XandY WHERE foreign_index = $idx;
To get all the related attributes, you just do a JOIN:
SELECT index, name, length, breadth, height, x, y FROM MyTable INNER JOIN XandY ON MyTable.index = XandY.foreign_index;