According to Dr. Stephen Hawking and the conservation of quantum information theory, information can neither be created nor destroyed… unless you work in IT. OK, he didn’t really say the part about IT; I did.
I see the terms data and information used interchangeably, but I differentiate them. To me, information is data that informs the user in some meaningful way. In other words, data is everywhere, is neither created nor destroyed, and some data is also information. Data contains the noise, information is the signal.
In the physical world, data is constantly curated and consumed—from emails to cat videos to this blog post. Not to mention error messages, system logs, and alert emails you never read. And this digital data is never destroyed, leading to an inundation of ROT: redundant, outdated, trivial information.
We are not generating more data today than yesterday. The data already existed, just read Dr. Hawking's theories on black holes and baby universes. The difference is today we have more access to data, and it is easier to store it for use later, maybe.
But just because you have the ability to curate data does not mean you should. Doing so leads to a bad case of ROT, and ROT has legitimate costs. Not just in storage and compute, but also the increased risk of the data being lost in a breach we all know will happen at some point.
We should stop collecting data simply because it is available. We should shift our focus to collecting information, not data. And collect information which answers the exact question you want answered. Storing data just in case it might be useful doesn't make you smart, it makes you a data hoarder.
Think of data like the food buffet at the Golden Corral. Take all you can eat, but eat all you take.
And stay away from the chocolate fountain data lake.
Community Links
The Most Costly Big Data Mistakes You Should Avoid
I would vote for "collecting more data than you need" as the worst mistake to make. Not just because of cost, but the hidden costs of keeping data you don't need, for longer than necessary, and inevitably having it breached.
Has Working at Home Actually Led to Longer Hours?
As someone who has lived at work for the past 11 years, the answer is "yes". It is difficult, but not impossible, to set boundaries when you work from home.
Events
The agenda for Live! 360 and SQL Server Live! has been released, you can review all the details here.
Live! 360 brings the IT, Developer, and Data communities together for six days of training, knowledge sharing, and networking. With unlimited access to Live! 360’s five co-located events, you and your team will get the training you need to keep you and your business competitive and future-ready.
Send any questions about the event to me at SQLRockstar@thomaslarock.com
Data Janitor Roundup
How Data Science Pinpointed the Creepiest Word in “Macbeth”
I read this piece and developed an appreciation for both the research and for the Bard's ability to creep us out in the simplest way possible.
Data Literacy
The ability to do critical thinking is the top reason why data science is seen as a difficult profession. This image also shows why data literacy isn't the right phrase, and we should think more about data fluency. A data professional needs to be fluent in many aspects of data.
FBI watchlist exposed by misconfigured Elasticsearch cluster
Once again we discover how humans are really bad at configuring basic security. We need to start holding people responsible for basic configuration mistakes resulting in data leakage.
Sponsor an Issue
Reach thousands of data professionals who care about data, databases, and helping others make their data the best version possible. Get in touch right here.