Welcome

Professor Imielinski

Tomasz Imieliński is a professor of computer science at Rutgers University. He has served as chairman of computer science department at Rutgers from 1996 till 2003. In 2000 he co-founded Connotate Technologies – web data extraction company based in New Brunswick NJ. Since 2004 till 2010 he has held multiple positions at Ask.com, from Vice president of data solutions to executive vice president of global search and answers and Chief Scientist. He has also served as VP of Data Solutions at IAC.

Course Summary

Big Data, Algorithms, and statistics are everywhere today. But how do you tell good data from bad? Misinformation from useful analysis? And who own the information about our lives and decisions?

Data 101 will help you improve your data literacy and develop a healthy skepticism about empirical claims present in the popular media. We will explore examples of erroneous, rushed and ad hoc conclusions based on so-called “big data”, and you will get hands-on experience analyzing and using data to make persuasive arguments.

You will also learn to make more informed decisions about what you find and share online. Along the way, you will learn fundamental and basic concepts in statistics and probability that are required to argue persuasively use data. Such as: Data Exploration, Data Visualization, Hypothesis testing, Difference of means, Null and Alternative Hypothesis, Permutation test, z-test, z-value, critical value, significance level, p-value, Bonferroni correction, Chi square test, Independence (if time allows), Bayesian Reasoning, Prior odds, Posterior odds, Likelihood ratio, False positive, True positive, Prediction, Cross Validation, Decision trees, Linear regression, Recursive partitioning (rpart), linear regression MSE, Predication accuracy, Training, Testing, Aggregate Data Paradoxes (Simpson, Ecological fallacy, Prosecutorial fallacy), Normal distribution, Central Limit Theorem, Power law distribution.

Meanwhile, you will acquire basic programming skills of R that will benefit you in your future coursework and beyond. R is a statistical software environment and programming language that we’ll use to analyze and visualize datasets. Learning simple R will take some work; however, if you’re able to master the basics covered in this class, you’ll gain a concrete, marketable skill that may very well be extremely useful in your academic and professional life.