final case

 

Chapter 5 exercises

20. Consider the task of building a classifier from random data, where the attribute values are generated randomly irrespective of the class labels. Assume the data set contains records from two classes, “+” and “−.” Half of the data set is used for training while the remaining half is used for testing.

(a) Suppose there are an equal number of positive and negative records in the data and the decision tree classifier predicts every test record to be positive. What is the expected error rate of the classifier on the test data?

(b) Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 0.8 and negative class with probability 0.2.

(c) Suppose two-thirds of the data belong to the positive class and the remaining one-third belong to the negative class. What is the expected error of a classifier that predicts every test record to be positive?

(d) Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 2/3 and negative class with probability 1/3.

Chapter 6 exercises

5. Prove Equation 6.3 in the book. (Hint: First, count the number of ways to create itemset that forms the left hand side of the rule. Next, for each size k itemset selected for the left-hand side, count the number of ways to choose the remaining d − k items to form the right-hand side of the rule.)

17. Suppose we have market basket data consisting of 100 transactions and 20 items. If the support for item a is 25%, the support for item b is 90% and the support for itemset {a, b} is 20%. Let the support and confidence thresholds be 10% and 60%, respectively.

(a) Compute the confidence of the association rule {a} -> {b}. Is the rule interesting according to the confidence measure?

(b) Compute the interest measure for the association pattern {a, b}. Describe the nature of the relationship between item a and item b in terms of the interest measure.

(c) What conclusions can you draw from the results of parts (a) and (b)?

(d) NOT NEEDED FOR THE TEST

Chapter 7 exercises

5. For the data set with the attributes given below, describe how you would convert it into a binary transaction data set appropriate for association analysis. Specifically, indicate for each attribute in the original data set.

(a) How many binary attributes it would correspond to in the transaction data set,

(b) How the values of the original attribute would be mapped to values of the binary attributes, and

(c) If there is any hierarchical structure in the data values of an attribute that could be useful for grouping the data into fewer binary attributes. The following is a list of attributes for the data set along with their possible values. Assume that all attributes are collected on a per-student basis:

• Year : Freshman, Sophomore, Junior, Senior, Graduate: Masters, Graduate: PhD, Professional

• Zip code : zip code for the home address of a U.S. student, zip code for the local address of a non-U.S. student

• College : Agriculture, Architecture, Continuing Education, Education, Liberal Arts, Engineering, Natural Sciences, Business, Law, Medical, Dentistry, Pharmacy, Nursing, Veterinary Medicine

• On Campus : 1 if the student lives on campus, 0 otherwise

• Each of the following is a separate attribute that has a value of 1 if the person speaks the language and a value of 0, otherwise.

- Arabic
- Bengali
- Chinese Mandarin
- English
- Portuguese
- Russian
- Spanish

Chapter 8 exercises

1. Consider a data set consisting of 2^(20) data vectors, where each vector has 32 components and each component is a 4-byte value. Suppose that vector quantization is used for compression and that 2^(16) prototype vectors are used. How many bytes of storage does that data set take before and after compression and what is the compression ratio?

8. Consider the mean of a cluster of objects from a binary transaction data set. What are the minimum and maximum values of the components of the mean? What is the interpretation of components of the cluster mean? Which components most accurately characterize the objects in the cluster?

9. Give an example of a data set consisting of three natural clusters, for which (almost always) K-means would likely find the correct clusters, but bisecting K-means would not.

11. Total SSE is the sum of the SSE for each separate attribute. What does it mean if the SSE for one variable is low for all clusters? Low for just one cluster? High for all clusters? High for just one cluster? How could you use the per variable SSE information to improve your clustering?

13. The Voronoi diagram for a set of 1( points in the plane is a partition of all the points of the plane into K regions, such that every point (of the plane) is assigned to the closest point among the 1( specified points. (See Figure 8.38.) What is the relationship between Voronoi diagrams and K-means clusters? What do Voronoi diagrams tell us about the possible shapes of K-means clusters?

Discusssin

 List three (3) advantages and three (3) disadvantages cloud-based providers have with respect to security using them as LEVEL 1 HEADINGS. Then, in YOUR  OWN words, and from an IT security manager’s perspective, explain how each advantage can help your business succeed and how each disadvantage can hurt business operations. 

Op Excellenteè

Select an organization that has a Global platform (they operate in more than one country), that has demonstrated operational excellence.  In this paper, perform the following activities:

Name the organization and briefly describe what good or service they sell and where they operate.

Note how they are a differentiator in the market.

Note the resources used to ensure success in their industry (remember resources are comprised of more than just people).

Explain what actions the company took to achieve operational excellence.

Discussion

1. Why are the original/raw data not readily usable by analytics tasks? What are the main data preprocessing steps? List and explain their importance in analytics.

2. What are the privacy issues with data mining? Do you think they are substantiated?

Integration

 

This week’s article provided a case study approach which highlights how businesses have integrated Big Data Analytics with their Business Intelligence to gain dominance within their respective industry.  Search the UC Library and/or Google Scholar for a “Fortune 1000” company that has been successful in this integration. Discuss the company, its approach to big data analytics with business intelligence, what they are doing right, what they are doing wrong, and how they can improve to be more successful in the implementation and maintenance of big data analytics with business intelligence. 

Your paper should meet the following requirements:

  • Be approximately four to six pages in length, not including the required cover page and reference page.
  • Follow APA 7 guidelines. Your paper should include an introduction, a body with fully developed content, and a conclusion

me-15

 (1) How do you measure how Amazon has delivered value for its shareholders?    
 

Cloud Computing

 

For this project, select an organization that has leveraged Cloud Computing technologies in an attempt to improve profitability or to give them a competitive advantage.  Research the organization to understand the challenges that they faced and how they intended to use Cloud Computing to overcome their challenges.  The paper should include the following sections each called out with a header.

• Company Overview:  The section should include the company name, the industry they are in and a general overview of the organization.
• Challenges: Discuss the challenges the organization had that limited their profitability and/or competitiveness and how they planned to leverage Cloud Computing to overcome their challenges.
• Solution:  Describe the organization’s Cloud Computing implementation and the benefits they realized from the implementation.  What was the result of implementing Cloud Computing?  Did they meet their objectives for fall short?
• Conclusion:  Summarize the most important ideas from the paper and also make recommendations or how they might have achieved even greater success.

Requirements:

The paper must adhere to APA guidelines including Title and Reference pages.  There should be at least three scholarly sources listed on the reference page.  Each source should be cited in the body of the paper to give credit where due.  Per APA, the paper should use a 12-point Time New Roman font, should be double spaced throughout, and the first sentence of each paragraph should be indented .5 inches.  The body of the paper should be 3 – 5 pages in length.  The Title and Reference pages do not count towards the page count requirements.