Abstract

Big data technology offers unprecedented opportunities to society as a whole and also to its individual members. At the same time, this technology poses significant risks to those it overlooks. In this article, we give an overview of recent technical work on diversity, particularly in selection tasks, discuss connections between diversity and fairness, and identify promising directions for future work that will position diversity as an important component of a data-responsible society. We argue that diversity should come to the forefront of our discourse, for reasons that are both ethical—to mitigate the risks of exclusion—and utilitarian, to enable more powerful, accurate, and engaging data analysis and use.

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
Simpson EH. Measurement of diversity. Nature. 1949;163:68-8.
2.
Page SE. The difference: How the power of diversity creates better groups, firms, schools, and societies. Princeton, NJ: Princeton University Press, 2007.
3.
Surowiecki J. The wisdom of the crowds. New York: Random House, Inc., 2005.
4.
Barocas S, Selbst AD. Big data's disparate impact. California Law Review 2016;104:671–732.
5.
Crawford K. Artificial intelligence's white guy problem. The New York Times, June 26, 2016.
6.
Lerman J. Big data and its exclusions. Stanford Law Review Online, 2013;66. Available at: https://www.stanfordlawreview.org/online/privacy-and-big-data-big-data-and-its-exclusions (accessed June 7, 2017).
7.
Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying search results. In: Proceedings of the Second International Conference on Web Search and Web Data Mining, WSDM 2009, Barcelona, Spain, February 9–11, 2009, pp. 5–14.
8.
Capannini G, Nardini FM, Perego R, Silvestri F. Efficient diversification of web search results. PVLDB 2011;4:451–459.
9.
Carbonell JG, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR’98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, August 24–28 1998, pp. 335–336.
10.
Clarke CLA, Kolla M, Cormack GV, et al. Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, July 20–24, 2008. pp. 659–666.
11.
Dang V, Croft WB. Diversity by proportionality: An election-based approach to search result diversification. In: The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR’12, Portland, OR, USA, August 12–16, 2012. pp. 65–74.
12.
Kaminskas M, Bridge D. Diversity, serendipity, novelty, and coverage: A survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Trans Interact Intell Syst. 2016;7:2:1–2:42.
13.
Vargas S, Castells P. Rank and relevance in novelty and diversity metrics for recommender systems. In: Proceedings of the 2011 ACM Conference on Recommender Systems, RecSys 2011, Chicago, IL, October 23–27, 2011, pp. 109–116.
14.
Yu C, Lakshmanan LVS, Amer-Yahia S. It takes variety to make a world: Diversification in recommender systems. In: EDBT 2009, 12th International Conference on Extending Database Technology, Saint Petersburg, Russia, March 24–26, 2009, pp. 368–378.
15.
Ziegler CN, McNee SM, Konstan JA, Lausen G. Improving recommendation lists through topic diversification. In: Proceedings of the 14th international conference on World Wide Web, www 2005, Chiba, Japan, May 10–14, 2005, pp. 22–32.
16.
Diaz-Uda A, Medina C, Schill B. Diversity's new frontier: Diversity of thought and the future of the workforce. 2013. Available online at http://dupress.deloitte.com/dup-us-en/topics/talent/diversitys-new-frontier.html (last accessed April 25, 2017).
17.
Google Official Blog. Getting to work on diversity at Google. 2014. Available online at https://googleblog.blogspot.co.il/2014/05/getting-to-work-on-diversity-at-google.html (last accessed April 25, 2017).
18.
Dobbin F, Kalev A. Why diversity programs fail. 2016. Available online at https://hbr.org/2016/07/why-diversity-programs-fail (last accessed April 25, 2017).
19.
Rezvani S. Five trends driving workplace diversity in 2015. 2015. Available online at www.forbes.com/sites/work-in-progress/2015/02/03/20768/#58cc73dd34c91 (last accessed April 25, 2017).
20.
Stoyanovich J, Amer-Yahia S, Milo T. Making interval-based clustering rank-aware. In: EDBT 2011, 14th International Conference on Extending Database Technology, Uppsala, Sweden, March 21–24, 2011, pp. 437–448.
21.
Bakshy E, Messing S, Adamic LA. Exposure to ideologically diverse news and opinion on facebook. Science. 2015;348:1130–1132.
22.
Kulesza A, Taskar B. Determinantal point processes for machine learning. Found Trends Mach Learn 2012;5:123–286.
23.
Anari N, Gharan SO, Rezaei A. Monte carlo markov chain algorithms for sampling strongly rayleigh distributions and determinantal point processes. In: Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, June 23–26, 2016. pp. 103–115.
24.
Deshpande A, Rademacher L. Efficient volume sampling for row/column subset selection. In: 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23–26, 2010. Las Vegas, Nevada, pp. 329–338.
25.
Celis LE, Deshpande A, Kathuria T, Vishnoi NK. How to be fair and diverse? CoRR, abs/1610.07183, 2016.
26.
Dwork C, Hardt M, Pitassi T, et al. Fairness through awareness. In: Innovations in Theoretical Computer Science 2012, Cambridge, MA, January 8–10, 2012. pp. 214–226.
27.
Romei A, Ruggieri S. A multidisciplinary survey on discrimination analysis. Knowl Eng Rev. 2014;29:582–638.
28.
Zliobaite I. A survey on measuring indirect discrimination in machine learning. CoRR, abs/1511.00148, 2015.
29.
Loehr A. 4 ways HR analytics can improve workplace diversity. 2015. Available online at www.cornerstoneondemand.com/rework/4-ways-hr-analytics-can-improve-workplace-diversity (last accessed April 25, 2017).
30.
Zuckerman E. How diverse is your social network? how diverse should it be? 2011. Available online at www.ethanzuckerman.com/blog/2011/06/14/how-diverse-is-your-social-network-how-diverse-should-it-be (last accessed April 25, 2017).
31.
Marlow C. Maintained relationships on facebook. 2009. Available online at www.facebook.com/note.php?note_id=55257228858 &ref = mf%20(2009) (last accessed April 25, 2017).
32.
Nikolov D, Oliveira DFM, Flammini A, Menczer F. Measuring online social bubbles. PeerJ CompSci. 2015;1:e3-8.
33.
Weng L, Menczer F. Topicality and impact in social media: Diverse messages, focused messengers. PLoS One 2015;10:e011841-0.
34.
Goodman EP, Powles J. Facebook and Google: Most powerful and secretive empires we've ever known. The Guardian, September 28, 2016.
35.
Drosou M, Pitoura E. Search result diversification. SIGMOD Record 2010;39:41–47.
36.
Gollapudi S, Sharma A. An axiomatic approach for result diversification. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20–24, 2009, pp. 381–390.
37.
Adomavicius G, Kwon YO. Improving aggregate recommendation diversity using ranking-based techniques. IEEE Trans Knowl Data Eng. 2012;24:896–911.
38.
Erkut E, Ülküsal Y, Yeniçerioglu O. A comparison of p-dispersion heuristics. Comput Oper Res. 1994;21:1103–1113.
39.
Vee E, Srivastava U, Shanmugasundaram J, et al. Efficient computation of diverse query results. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7–12, 2008. Cancún, México, pp. 228–236.
40.
Wu T, Chen L, Hui P, et al. Hear the whole story: Towards the diversity of opinion in crowdsourcing markets. PVLDB 2015;8:485–496.
41.
Munson SA, Zhou DX, Resnick P. Sidelines: An algorithm for increasing diversity in news and opinion aggregators. In: Proceedings of the Third International Conference on Weblogs and Social Media, ICWSM 2009, San Jose, California, May 17–20, 2009, pp. 130–137.
42.
Yang Z, Fu AWC, Liu R. Diversified top-k subgraph querying in a large graph. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, June 26–July 1, 2016, pp. 1167–1182.
43.
Santos RLT, Macdonald C, Ounis I. Exploiting query reformulations for web search result diversification. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, April 26–30, 2010, pp. 881–890.
44.
Drosou M, Pitoura E. DisC diversity: Result diversification based on dissimilarity and coverage. PVLDB 2012;6:13–24.
45.
Drosou M, Pitoura E. Multiple radii DisC diversity: Result diversification based on dissimilarity and coverage. ACM Trans Database Syst. 2015;40:4.
46.
Lathia N, Hailes S, Capra L, Amatriain X. Temporal diversity in recommender systems. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, July 19–23, 2010. pp. 210–217.
47.
Herlocker JL, Konstan JA, Terveen LG, Riedl J. Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst. 2004;22:5–53.
48.
Liu Z, Sun P, Chen Y. Structured search result differentiation. PVLDB 2009;2:313–324.
49.
Wang DW, Kuo YS. A study on two geometric location problems. Inf Process Lett. 1988;28:281–286.
50.
Zhu X, Goldberg AB, Gael JV, Andrzejewski D. Improving diversity in ranking using absorbing random walks. In: Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, April 22–27, 2007. Rochester, New York, pp. 97–104.
51.
Zhang B, Li H, Liu Y, et al. Improving web search results using affinity graph. In: SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, August 15–19, 2005, pp. 504–511.
52.
Zhang M, Hurley N. Avoiding monotony: Improving the diversity of recommendation lists. In: Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys 2008, Lausanne, Switzerland, October 23–25, 2008. pp. 123–130.
53.
Caprara A, Kellerer H, Pferschy U, Pisinger D. Approximation algorithms for knapsack problems with cardinality constraints. Eur J Operat Res 2000;123:333–345.
54.
Drosou M, Pitoura E. Diverse set selection over dynamic data. IEEE Trans Knowl Data Eng. 2014;26:1102–1116.
55.
Drosou M, Pitoura E. Diversity over continuous data. IEEE Data Eng Bull. 2009;32:49–56.

References

Cite this article as: Drosou M, Jagadish HV, Pitoura E, Stoyanovich J (2017) Diversity in big data: a review. Big Data 5:2, 73–84, DOI: 10.1089/big.2016.0054.

Information & Authors

Information

Published In

cover image Big Data
Big Data
Volume 5Issue Number 2June 2017
Pages: 73 - 84
PubMed: 28632443

History

Published in print: June 2017
Published online: 1 June 2017

Permissions

Request permissions for this article.

Topics

Authors

Affiliations

Marina Drosou
Department of Computer Science, University of Ioannina, Ioannina, Greece.
H.V. Jagadish
Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan.
Evaggelia Pitoura
Department of Computer Science, University of Ioannina, Ioannina, Greece.
Julia Stoyanovich* [email protected]
Department of Computer Science, Drexel University, Philadelphia, Pennsylvania.

Notes

*
Address correspondence to: Julia Stoyanovich, Department of Computer Science, Drexel University, Philadelphia, PA 19104, E-mail: [email protected]

Author Disclosure Statement

No competing financial interests exist.

Metrics & Citations

Metrics

Citations

Export citation

Select the format you want to export the citations of this publication.

View Options

Access content

To read the fulltext, please use one of the options below to sign in or purchase access.

Society Access

If you are a member of a society that has access to this content please log in via your society website and then return to this publication.

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

View options

PDF/EPUB

View PDF/EPUB

Full Text

View Full Text

Figures

Tables

Media

Share

Share

Copy the content Link

Share on social media

Back to Top