Artigos
03/09/2020
Designing ETL Pipelines with Structured Streaming and Delta Lake— How to Architect Things Right (Youtube)
"Eu quero meu dashboard atualizado a cada segundo".
Será que é um erro!?!? :)
24/08/2020
How to Calculate the Cost of Data Downtime
"One CDO I spoke with recently told me that his 500-person team spends 1,200 cumulative hours per week tackling data quality issues, time otherwise spent on activities that drive innovation and generate revenue (...)"
08/08/2019
Governo formaliza adesão à nuvem pública em contrato com Embratel
"(...) O projeto com a Embratel faz parte da contratação da primeira nuvem pública do governo federal, licitada ainda em 2018 e que começou a ser implementada no ano passado. Essa primeira contratação reúne 23 órgãos públicos e tem custo projetado de R$ 55 milhões. "
02/06/2020
COVID-19 didn’t break your business. Data did.
(...) Not every enterprise stumbled, nor did every government. The defining factor wasn't how digital these public and private entities were. These surviving enterprises and governments embraced the data both pre-COVID-19 and during COVID-19 (...)
20/02/2020
How Amazon is solving big-data challenges with data lakes
"(...) A major reason companies choose to create data lakes is to break down data silos. Having pockets of data in different places, controlled by different groups, inherently obscures data. This often happens when a company grows fast and/or acquires new businesses. In the case of Amazon, it's been both (...)" (Werner Vogels)
26/09/2019
Optimize Your Amazon S3 Data Lake with S3 Storage Classes and Management Tools (Youtube)
"As your data lake grows, it becomes increasingly important to manage objects at scale and optimize storage costs and resources. In this tech talk, AWS experts provide an overview of S3's capabilities that allow you to manage data at the object, bucket, and account levels. Learn about and watch demos for S3 Batch Operations. Also learn cost-optimization best practices by storing objects across the S3 Storage Classes."
21/08/2019
Leitura Recomendada: "Factfulness: Ten Reasons We're Wrong about the World"
"I don’t love numbers. I am a huge, huge fan of data, but I don’t love it. It has its limits. I love data only when it helps me to understand the reality behind the numbers, i.e., people’s lives. In my research, I have needed the data to test my hypotheses, but the hypotheses themselves often emerged from talking to, listening to, and observing people. Though we absolutely need numbers to understand the world, we should be highly skeptical about conclusions derived purely from number crunching."
16/07/2019
Australia-wide AWS deal
The Australian government's attitude towards cloud has been very positive, and according to Amazon Web Services (AWS) Worldwide Public Sector Asia Pacific regional managing director Peter Moore, what's prevented an all-in approach has been legacy arrangements and a traditional approach to procurement (...)
16/07/2019
Machine Learning for Everyone
(...) Without all the AI-bullshit, the only goal of machine learning is to predict results based on incoming data. That's it. All ML tasks can be represented this way, or it's not an ML problem from the beginning. The greater variety in the samples you have, the easier it is to find relevant patterns and predict the result (...)
05/06/2019
AWS, Microsoft or Google: Which cloud computing giant is growing the fastest?
"Spending on cloud computing infrastructure continues to grow at a furious pace, but cloud vendors will have to work harder for their profits from now on. Global cloud infrastructure services market grew 42 percent year-on-year in the first quarter of 2019 with Amazon Web Services (AWS) making the biggest gain in dollar terms with sales up by $2.3 billion (41%) on Q1 2018, according to data from tech analyst firm Canalys. That performance put AWS further ahead of second-placed Microsoft, even though it grew sales by $1.5 billion or 75 percent. Google was the fastest growing of the top three in percentage terms, up 83 percent from $1.2 billion to $2.3 billion (....)"
05/06/2019
Cloud Data Warehouse Benchmark: Redshift, Snowflake, Azure, Presto and BigQuery
What data warehouse should I choose?
16/05/2019
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (Martin Fowler)
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product. (...)
30/04/2019
Can You Be a Great Leader Without Technical Expertise? (Harvard Business Review)
"There is a broad assumption in society and in education that the skills you need to be a leader are more or less transferable... But recent research is rightly challenging this notion. Studies suggest that the best leaders know a lot about the domain in which they are leading, and part of what makes them successful in a management role is technical competence. For example, hospitals managed by doctors perform better than those managed by people with other backgrounds. And there are many examples of people who ran one company effectively and had trouble transferring their skills to the new organization(...) "
17/04/2019
IBM and Oracle are out of the running for $10 billion government cloud contract
"(...) AWS and Microsoft are the only companies that meet the minimum requirements for the contract, Defense Department spokesperson Elissa Smith told CNBC in an email on Thursday. The New York Times first reported the decision on Wednesday. (...)"
10/04/2019
Discurso de Netanyahu sobre Livre Mercado (e Big Data, IA e inovação) (Youtube)
Interessante a visão do Primeiro Ministro de Israel sobre Big Data, IA e inovação. Será que funciona no Brasil?
02/04/2019
The future is monocloud or multicloud?
Volkswagen and Amazon Web Services are partnering on an industrial IoT cloud — and it highlights how automakers could become more efficient
12/03/2019
What Data Scientists Really Do, According to 35 Data Scientists
(...) Ethics is among the field’s biggest challenges (of Data Science) ... And we need to actually have proper licenses so that if you actually do something unethical, perhaps you have some kind of penalty, or disbarment, or some kind of recourse, something to say this is not what we want to do as an industry, and then figure out ways to remediate people who go off the rails and do things because people just aren’t trained and they don’t know (...)
20/02/2019
What Makes a Successful AI Company - Data Dominance
(...) To create a successful AI company you must create such a wide moat that no one can catch up unless they pay your price. That moat is not about technology. There are essentially no monopolies on deep learning technologies, only leaders that can quickly be copied. The secret to a wide moat in AI is to have a virtual monopoly on the data you are using to train. In this case monopoly also means such a large lead in users and data volume that no one can reasonably catch up. (...)
13/02/2019
Future of the Game: Baseball's Latest Statistical Revolution (YouTube)
It's time to embrace baseball's statistical revolution. The bandwagon hasn't left yet, there's still time to jump on. The advanced statistics and analytics that have played a part in transforming the way many look at America's Pastime are here to stay.
29/01/2019
No, Machine Learning is not just glorified Statistics
This meme has been all over social media lately, producing appreciative chuckles across the internet as the hype around deep learning begins to subside. The sentiment that machine learning is really nothing to get excited about, or that it’s just a redressing of age-old statistical techniques, is growing increasingly ubiquitous; the trouble is it isn’t true (...)
10/12/2018
AWS Reinvent 2018
Awesome content!! Fantastic speed of innovation!!
AWS Reinvent 2018 (Keynote with Andy Jassy)
AWS Reinvent 2018 (Keynote with Werner Vogels)
25/09/2018
Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
(...) If you read the recruiting propaganda of data science and algorithm development departments in the valley, you might be convinced that the relationship between data scientists and engineers is highly collaborative, organic, and creative. Just like peas and carrots. However, it’s not a well kept secret that this is seldom the case. Most shops foster a relationship between engineers and scientists that lies somewhere in the spectrum between non-existent1 and highly dysfunctional. (...)
20/09/2018
Why the Pentagon’s $10 billion JEDI deal has cloud companies going nuts
"By now you’ve probably heard of the Defense Department’s massive winner-take-all $10 billion cloud contract dubbed the Joint Enterprise Defense Infrastructure (or JEDI for short). Star Wars references aside, this contract is huge, even by government standards.The Pentagon would like a single cloud vendor to build out its enterprise cloud, believing rightly or wrongly that this is the best approach to maintain focus and control of their cloud strategy..."
11/09/2018
Magic Quadrant for Cloud Infrastructure as a Service, Worldwide (Gartner)
(...) Cloud computing is a style of computing in which scalable and elastic IT-enabled capabilities are delivered as a service using internet technologies. Cloud infrastructure as a service (IaaS) is a type of cloud computing service; it parallels the infrastructure and data center initiatives of IT. Cloud compute IaaS constitutes the largest segment of this market (the broader IaaS market also includes cloud storage and cloud printing). Only cloud compute IaaS is evaluated in this Magic Quadrant. (...)
29/08/2018
Why We Need More Domain Experts In The Data Sciences
(...) Precious few data scientists that I’ve met have deep backgrounds or rigorous training in the disciplines and domain areas in which they find themselves presently deployed. In many of the organizations I’ve worked with, data scientists are treated as spot problem solvers, moved rapidly across the entirety of the organization’s practices, analyzing deeply technical and nuanced problems in one domain and then addressing deeply complex issues in an entirely different domain the following day. Often spreadsheets are tossed over the fence to the data sciences group each morning and the results of the final model emailed back that afternoon with little interaction or communication among the producer and consumer of the data analytic pipeline. (...)
08/08/2018
13 Common Mistakes Amateur Data Scientists Make and How to Avoid Them?
(...) However, becoming a data scientist does not come easy. It needs a mix of problem solving, structured thinking, coding and various technical skills among others to be truly successful. If you are from a non-technical and non-mathematical background, there’s a good chance a lot of your learning happens through books and video courses. Most of these resources don’t teach you what the industry is looking for in a data scientist. This is one of the reasons why aspiring data scientists are struggling to bridge the gap between self education and real-world jobs. (...)
17/07/2018
Organizing Data Science Teams (podcast)
What are best practices for organizing data science teams? Having data scientists distributed through companies or having a Centre of Excellence? What are the most important skills for data scientists? Is the ability to use the most sophisticated deep learning models more important than being able to make good powerpoint slides?
14/06/2018
Think Like a Statistician – Without the Math
"I call myself a statistician, because, well, I’m a statistics graduate student. However, ask me specific questions about hypothesis tests or required sampling size, and my answer probably won’t be very good (...) Instead, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data. Here they are in no particular order (...)"
06/06/2018
Cloud workloads, data lakes challenge information architecture
(...) I have become a strong believer in data lakes. They are a staging ground for the data warehouse and -- probably more important down the road -- they are a data bed for data science in the organization (...)
30/05/2018
Predictive Analytics in the Supply Chain
(...) Increasingly though, a requirement of high maturity is the ability to better foresee the future, anticipate future events, and make optimal tradeoffs based on intentional strategic choices of top management. In short, to be at the top of the game in Supply Chain Management now requires including advanced predictive analytics (...)
16/05/2018
Amazon delivers hefty profits, led by web services
"(...) AWS had the unusual advantage of a seven-year head start before facing like-minded competition, and the team has never slowed down," Amazon founder and chief Jeff Bezos said in the first-quarter earnings release (...)"
25/04/2018
How Netflix Uses Big Data to Drive Success
"Netflix has over 100 million subscribers and with that comes a wealth of data they can analyse to improve the user experience. Big data has helped Netflix massively in their mission to become the king of stream."
18/04/2018
Do You Trust This Computer?
Excelente esse documentário sobre o futuro e os "perigos" da Inteligência Artificial. E com depoimentos de feras como Elon Musk. Alarmista ou não, vale a pena conferir.
05/04/2018
The data engineering ecosystem
"(...) Managed Services on the rise. On a slightly more contentious note, ‘serverless’ offerings are also a developing trend. There is a growing desire for data teams like The New York Times to architect pipelines without the effort of managing any underlying infrastructure. While production use-cases for these services have been relatively limited, the features they offer are continuing to improve. With services like AWS’s S3, Redshift, Athena, EMR, Kinesis, and Lambda, as well as GCP’s BigQuery, Pub/Sub, and DataProc, the major cloud providers are clearly investing in these full-service solutions (...)"
28/03/2018
Data Engineering Ecosystem Map
This iteration provides a streamlined view of the core components of data pipelines, while enabling deeper exploration of the complex world of distributed system technologies.
13/03/2018
Why every company will soon need a data refinery
(...) Unlike oil, companies no longer have to find where the value is, plenty of companies are sitting on virtual oil reserves. But even a huge horde of data won’t magically turn into value. That requires a data refinery, and a new set of tools to find and extract value. (...)
05/03/2018
Livro: Creating a Data-Driven Enterprise with DataOps - Insights from Facebook, Uber, LinkedIn, Twitter, and eBay
Leitura obrigatória para quem está trabalhando com estratégias / projetos de Big Data & Analytics em sua empresa... ;)
01/01/2020
Big Data Analytics on Amazon Web Services (AWS)
Com certeza, assim Big Data fica mais fácil... ;)
18/02/2018
DataOps: Building A Next Generation Data Engineering Organization
(...) If you ask most employees, they likely believe their company’s data is neatly organized and easily accessible. As enterprise data professionals know, the reality is that a typical data environment resembles a “random data salad.” For decades, companies have been idiosyncratically deploying systems for business process automation, with the data generated from these deployments treated mainly as “exhaust” to the business processes. The resulting data environment is deeply fragmented and virtually impossible to integrate at scale — crushing the hopes of companies that want to develop an analytical advantage. (...)
15/02/2018
Forbes: 10 Predictions For AI, Big Data, And Analytics in 2018
(...) Forrester expects 50% of enterprises to embrace a public-cloud-first policy in 2018 for data, big data, and analytics, as they look for more control over costs and more flexibility than on-premises software can deliver. (...)
08/02/2018
Ford Motor Company: Data and the future of autonomous vehicles
"(...) There's a need for new frameworks around which to think about data. There is transportation infrastructure data, data from the vehicle, from passengers in the vehicle, from other vehicles, as well as data providers. It's a very complex ecosystem. We tend to think about the data vehicles produce, but this goes far beyond that (...) You have to understand that we are not talking about a single cloud. This is not about Ford's cloud versus BMW's cloud or GM's cloud. Ford will have a cloud, but Delphi will also have a cloud, and weather. com will have a cloud. So, even for data that's going outside the vehicle to this type of infrastructure, the data decisions are very complex (...)"
30/01/2018
The Best Data Scientists Get Out and Talk to People
(...) Not all data scientists spend enough time understanding the deeper reality they study. Some concentrate too much on the numbers. For example, in predicting the most recent election, the place to be was in the mind of the potential voter. You can’t go there directly, so many individuals and publications, from the New York Times to the Princeton Election Commission, used polls to predict who would win. But most were way off. (...)
24/01/2018
Putting a Price on Farm Data
(...) With the increasing availability of computer resources, new techniques of analysis have been developed to analyze data in general from which agricultural research has much to benefit. Research data accumulated for years can now be assessed via data mining and experimentation optimized with the help of simulation. (...)
15/01/2018
Laboratórios se unem e identificam R$ 44 bi com indícios de lavagem de dinheiro no Brasil em uma década
Interessante, eu não sabia que o Brasil tinha sua própria NSA. Talvez uma estrutura centralizada fosse mais apropriada, não?
10/01/2018
Agile is Dead (Dave Thomas) (Youtube)
Refundando a Agilidade ou retornando às suas origens!?!
Dave Thomas, um dos pais do Manifesto Ágil, falando sobre os mitos que foram criados e o "esquecimento" dos valores.
Is Agile dead!?! ;)
03/01/2018
AlphaGo (documentáro no Netflix)
Ninguém acreditava que uma máquina (Inteligência Artificial) poderia vencer um homem em um tabuleiro de Go. Para os interessados no assunto, vale a pena assistir esse documentário (lançamento no Netflix).
20/12/2017
No Data, Just Questions Please
(...) Questions provide context, questions lead to relationships, questions expand your horizon, questions lead you business savvy, and in doing all that, and more, questions provide that magical missing ingredient: Purpose. How do you distinguish between raw data requests (which lead to more data puking by you) and questions (which lead to the desirable combo of analysis and action)? (...)
12/12/2017
re:Invent 2017 (AWS)
Seguem as playlists do AWS re:Invent 2017, realizado agora no início de dezembro, com conteúdo de primeira sobre:
Big Data & Analytics (34 apresentações) -> https://goo.gl/3xGC2w
Machine Learning (35 apresentações) -> https://goo.gl/YCFnmN
Arquitetura (33 apresentações) -> https://goo.gl/UU9A6y
IoT (20 apresentações) -> https://goo.gl/PdYLpC
Enjoy it!! ;)
22/11/2017
Why Your Cloud Strategy Is Failing? – The Cloud Adoption Mistakes
(...) During my interaction over the past couple of years, I have realized that most of the customers who have started their business on cloud as well as the customers who have moved applications from on-premise, make many similar mistakes. Additionally, customers that move from on-premise often fail to unlearn a lot of things that can make their transition to cloud an overall success for a lower TCO (...)
13/11/2017
Data processing startup Databricks raises $140 million
Analytics and AI startup Databricks has raised $140 million in new financing led by Andreessen Horowitz. New Enterprise Associates and Battery Venturesalso participated in the funding round. The San Francisco-based company plans to use the new funding to bolster its enterprise analytics platform, accelerate growth strategy, and hire more engineering and customer service employees (...)
07/11/2017
Case UPS: Using Analytics to improve performance
(...) Once Orion is fully operational for more than 55,000 drivers this year, it will lead to a reduction of about 100 million delivery miles -- and 100,000 metric tons of carbon emissions. Perez says these reductions represent a key measure of business efficiency and effectiveness, particularly in terms of sustainability (...)
04/10/2017
Big Data Executive Survey 2017 (download)
"(...) If there is any sobering trend in these results, it lies in the apparent difficulty of organizational and cultural change around Big Data. More than 85% of respondents report that their firms have started programs to create data-driven cultures, but only 37% report success thus far. Big Data technology is not the problem; management understanding, organizational alignment, and general organizational resistance are the culprits. If only people were as malleable as data(...)"
12/09/2017
Movimentos dos grandes players de Cloud (UOL)
(...) Na Amazon, por sua vez, há rumores de duas grandes mudanças que invertem completamente a lógica até então adotada. A primeira grande mudança é um possível acordo com a VMWare, sua arqui-inimiga na gestão de infraestrutura on-premise. A VMWare é dominante absoluta no mercado de datacenter. Este movimento da Amazon me parece uma resposta à Microsoft, para sua versão privada de nuvem chamada Azure Stack, prevista para lançamento público em setembro deste ano. Esta versão já estava disponível para testes e validação, mas agora vai para prateleira, literalmente (...)
29/08/2017
Case: Big Data for Transportation and Traffic optimization (webinar)
"From automated vehicles to ride hailing apps, transportation as we know it is changing - and fast. But new technologies alone won't help communities build the efficient, equitable, and sustainable transportation networks communities want. In fact, these innovative technologies could do just the opposite, especially if they are not deployed wisely. Cities must collect the right data and enact the right policies to ensure they do not exacerbate problems like inequity and traffic, and to hold themselves accountable to the promise of new mobility technologies (...)"
23/08/2017
Things Are Holding Back Your Analytics, and Technology Isn’t One of Them
"(...) Consider the experience of one retail financial services firm. There, the analytics function was comprised of employees who used specialized software packages exclusively and specified complicated functional forms whenever possible. At the same time, the group eschewed traditional business norms such as checking in with clients, presenting results graphically, explaining analytic results in the context of the business, and connecting complex findings to conventional wisdom. The result was an isolated department that business partners viewed as unresponsive, unreliable, and not to be trusted with critical initiatives. On the other hand, analysts who are too deeply embedded in business functions tend to be biased toward the status quo or leadership’s thinking (...)"
16/08/2017
Case: Operação Serenata de Amor (combate à corrupção)
Um exemplo prático de como um trabalho de análise de dados pode gerar resultados no combate à corrupção. Deveríamos ter um grupo desse para cada Estado, para cada Prefeitura e para cada condomínio. Parabéns aos participantes!!! PS: Lembrando que a operação Lava Jato também tem utilizado intensivamente conceitos de Big Data & Analytics e como resultado já repatriou mais de 11 bilhões de reais.
08/08/2017
Are Analytics truly Self-Service?
"(...) The press and some vendors tend to discuss analytical expertise as binary—rank amateurs vs. experienced professional analysts, or Ph.D. data scientists vs. “citizen data scientists.” You probably already realize that the world is a little more complex than that. Neither all amateurs nor all professionals are created equal. There is a continuum of expertise about almost every phase of analytics. Some “amateurs” may not know when to employ logistic regression, but may be quite wise about how to frame an decision and how to communicate the results of analyses in a way that inspires trust and action. And the most sophisticated statistician or data scientist may be lacking in some of those same attributes (...)"
01/08/2017
Case: Big Data in healthcare
(...) The large volumes of data all these systems produce, even where integrations are able to be engineered, get expensive to store on conventional data warehouse systems. That leads to having only the most recent data stored there, further reducing analytical visibility. And even if you could solve that problem, you'd need to do it again for other healthcare constituencies, like the payers (health insurance carriers), the pharmaceutical companies and the medical equipment manufacturers (...)
18/07/2017
IT Is Not Analytics. Here’s Why.
(...) Analytics, big data, and data science are relatively new terms. Rarely do you walk into an organization and find an analytics or data science team, or even one person solely responsible for turning data into actionable information to improve the business. In fact, most have not even developed robust reporting solutions that enable the business to progress with relatively low human interaction. Rather, they have bolted reporting and analytics responsibilities onto the already busy schedules of their IT professionals. This is as mistake (...)
11/07/2017
How FC Barcelona uses football player data to win games
Soccer clubs are using more and more data, and those in the Spanish professional league are no exception. Most teams have realized the importance of performance analysis and learned how to utilize data to win games and make more money (...)
05/07/2017
Quando Big Data vira Big Fail (UOL)
O governo de São Paulo vai relançar seu principal sistema de segurança, que ainda não funciona como prometido, mesmo depois de quase três anos de sua apresentação oficial e cerca de R$ 30 milhões envolvidos. O programa, batizado de Detecta e importado de Nova York, foi anunciado em 2014 durante a campanha de reeleição do governador Geraldo Alckmin (PSDB) como a mais moderna ferramenta de combate ao crime no mundo (...)
20/06/2017
Caterpillar: How Predictive Maintenance Saves Millions Of Dollars with Big Data
When it comes to big data and Internet of Things (IoT) initiatives most companies are still in the design or early adoption phases which make it hard to get a solid return on investment (ROI) figures. So it’s refreshing to share a story of an organization delivering real-world ROI for their customers by vastly ramping up their data collection and predictive maintenance analytics (...)
13/06/2017
If Your Company Isn’t Good at Analytics, It’s Not Ready for AI
Management teams often assume they can leapfrog best practices for basic data analytics by going directly to adopting artificial intelligence and other advanced technologies. But companies that rush into sophisticated artificial intelligence before reaching a critical mass of automated processes and structured analytics can end up paralyzed. They can become saddled with expensive start-up partnerships, impenetrable black-box systems, cumbersome cloud computational clusters, and open-source toolkits without programmers to write code for them (...)
01/01/2020
Good news: CIOs have stopped fighting the Cloud
(...) You can see that shift in a study by Trustmarque that shows more than nine in ten U.K. CIOs and IT decision-makers polled said they plan to migrate their organizations on-premises workloads to the cloud within five years. The study polled 200 CIOs and senior IT decision-makers in enterprises with more than 1,000 employees. (...)
29/05/2017
Big Data At Caesars Entertainment - A One Billion Dollar Asset?
(...) The most valuable of the individual assets being fought over by creditors is the data collected over the last 17 years through the company’s Total Rewards loyalty program, which gained Caesar’s a reputation as a pioneer in Big Data-driven marketing and customer service. Total Rewards is estimated to be worth over $1 billion. (...) “We use database marketing and decision-science-based analytical tools to widen the gap between us and casino operators who base their customer incentives more on intuition than evidence," said Loveman way back in 2003 in the Harvard Business Review (...)
22/05/2017
Has the Hadoop market turned a corner?
(...) The cloud will be key to making Big Data - and Hadoop - accessible to the next wave of adopters who won't have the technical savvy or resources of the early adopters. We expect that by year end 2018, that most new Hadoop implementations will be in the cloud (...)
15/05/2017
How to think like a data scientist to become one
"We have all read the punchlines – data scientist is the sexiest job, there’s not enough of them and the salaries are very high. The role has been sold so well that the number of data science courses and college programs are growing like crazy. After my previous blog post I have received questions from people asking how to become a data scientist – which courses are the best, what steps to take, what is the fastest way to land a data science job? ... "
09/05/2017
Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data
"(...) In order to allow you to process your data as-is, where-is, while taking advantage of the power and flexibility of Amazon Redshift, we are launching Amazon Redshift Spectrum. You can use Spectrum to run complex queries on data stored in Amazon Simple Storage Service (S3), with no need for loading or other data prep. (...)"
02/05/2017
Analytics on offense: How to build a Data strategy
A big Wall Street bank’s Chief Digital Officer and an analytics thought leader walk into a bar.
The analytics person says, “So, tell me about your data strategy.” The CDO nods. After some careful consideration, he replies, “Yeah, we still need to decide what we’re going to do with our data.” This is the anti-punchline of analytics: Every big enterprise is collecting troves of data, but few have agreed on what to do with it or how to operationalize that information (...)
25/04/2017
Follow the CAPEX: Cloud Table Stakes
(...) The magnitude of these expenditures (in cloud) is even more impressive when compared to some of both the biggest companies on the planet and the biggest spenders on CAPEX. It is stunning that Microsoft now outspends Intel on CAPEX. (...)
17/04/2017
Cloud's Market Share Battle and How it can Hurt your Business
It's the year 2017 and there is a fierce and aggressive battle for market share. Even though Amazon Web Services (AWS) has a respectable lead, we can never count Microsoft out, having proven again and again (except in the mobile market fiasco) that when their machine sets their eyes on a prize, they almost always get it. In this article I will expose some of the consequences of the ever growing battle between cloud providers. The good, the bad and the ugly...
27/03/2017
What is the difference between Data Engineering and Data Science?
If you're interested in the field of analytics, you've probably heard the terms Data Engineering and Data Science, but do you know the difference? Although there has historically been considerable overlap between the two professions, they are each becoming more distinct. DataCamp created an infographic to help you understand the skills and responsibilities of each role ...
10/03/2017
Spark is the future of Analytics
"(...) Heudecker (Gartner) closed his presentation with the pronouncement that he had no idea whether or not Spark is the future of data analysis, and bolted the venue faster than a jackrabbit on Ecstasy. Which begs the question: why pay big bucks for analysts who have no opinion about one of the most active projects in the Big Data ecosystem? Here are eight reasons why Spark has a central role in the future of analytics..."
10/03/2017
Google spent $30 billion on its cloud and is making some undeniable progress
"(...) Amazon is still setting the tone for the entire cloud computing infrastructure market and has already won today's enterprise. But that doesn't mean there's no place for Google, especially if Google is playing the long game (which it says it's doing). In the 16 months since Greene joined Google, she's definitely showing progress."
06/03/2017
New Zealand bank replaces SAS server with R Server
Heartland Bank, a rapidly growing bank in New Zealand, has adopted a data-driven approach to analyzing risk, evaluating credit lines, and understanding cash flows. But they found their legacy SAS system to be labor-intensive and time consuming when it came to updating financial models, and it was expensive to boot. (Being licensed on a per-user basis, it was available only to a small group staff in IT.) The bank wanted an analytics platform that could support future innovation, and so Heartland Bank replaced SAS with Microsoft R Server and SQL Server...
20/02/2017
How Analytics is Making Basketball a More Beautiful Game
The NBA is the #1 global sports league -- and analytics is helping make it even better. The NBA London Global Games 2017 were held last week, accompanied by an NBA-sponsored Leaders Meet: Innovation technology event in London earlier in the day. The event was kicked off by Steve Hellmuth, NBA's EVP Media Operations and Technology, who talked about how new technology is helping optimize the fan experience...
14/02/2017
Data Science and Statistics: different worlds? (Youtube)
Chris Wiggins (Chief Data Scientist, New York Times)
David Hand (Emeritus Professor of Mathematics, Imperial College)
Francine Bennett (Founder, Mastodon-C)
Patrick Wolfe (Professor of Statistics, UCL / Executive Director, UCL Big Data Institute)
Zoubin Ghahramani (Professor of Machine Learning, University of Cambridge)
06/02/2017
Building Data Science Teams or DSaaS - Data Science as a Service?
"(...) building a successful data science team requires two essential soft skills - curiosity and business thought process (...)"
23/01/2017
Why R is the best Data Science language to learn today
In last week’s blog, I explained why you should Master R (even if it may eventually become obsolete). I wrote that article to address people who claim mastering R is a bit of a waste of time (because it will eventually become obsolete). But when I suggested that R may eventually become obsolete, this seemed to provoke fear that Ris becoming obsolete right now.I want to allay your fears: R is still very popular. R has been one of the fastest growing programming languages of the last decade...
18/01/2017
Big Data is About Agility
As a buzzword, the phrase “big data” summons many things to mind, but to understand its real potential, look to the businesses creating the technology. Google, Facebook, Microsoft, and Yahoo are driven by very large customer bases, a focus on experimentation, and a need to put data science into production. They need the ability to be agile, while still handling diverse and sizable data volumes...
09/01/2017
Introducing the Data Lake Solution on AWS
Many of our customers choose to build their data lake on AWS. They find the flexible, pay-as-you-go, cloud model is ideal when dealing with vast amounts of heterogeneous data. While some customers choose to build their own lake, many others are supported by a wide range of partner products. Today, we are pleased to announce another choice for customers wanting to build their data lake on AWS: the data lake solution. The solution is provided as an AWS CloudFormation script that you can use out-of-the-box, or as a reference implementation that can be customized to meet your unique data management, search, and processing needs.
03/01/2017
Make R a Legitimate Part of Your Organization
In many organizations, R enters through the back door when analysts download the free software and install it on their local workstations. Whether you are an analyst wanting to do more, a stakeholder wanting a competitive analytic platform, or an IT professional wanting a controlled and secured environment, you should make R a legitimate part of your organization and get the resources needed to support it...
26/11/2016
The rise of cloud culture: A sign of further maturation?
The idea that IT leaders and their teams should pay attention to broader issues outside of their direct remit has been on the agenda of many businesses for years. IT should play a much greater role, it has been argued, than its traditional speciality of ‘keeping the technology lights on’. But for many, the scope to achieve greater levels of integration has been limited, not least by the need to control IT investment and drive down costs...
09/11/2016
Amazon doubles its public cloud lead, can anyone catch up?
There are clear signs that Microsoft Azure is making gains against Amazon Web Services (AWS). But, based on recent data from Synergy Research, it's not nearly enough. As reported by Liam Tung, AWS has double the market share of its next three largest competitors combined. While that isn't as dramatic as AWS formerly boasting 10X the utilized capacity of its largest 14 competitors, it's still a sign that that, in the public cloud, winner takes most...
03/11/2016
What’s better: Amazon’s Availability Zones vs. Microsoft Azure’s regions
Although they both offer core IaaS features like virtual machines, storage and databases the leading public cloud providers, Amazon Web Services and Microsoft Azure, take very different approaches in offering cloud services, including at the most basic level of how their data centers are constructed and positioned around the world...
27/10/2016
A Side-by-Side Comparison of AWS, Google Cloud and Azure
Three main players of business cloud services have an array of products covering all you can possibly need for your online operations. But there are differences not only in pricing but also in how they name and group their services, so let’s compare one next to another and find out what they offer. We’ll focus on services provided by Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure. We won’t cover all of them, or get into much detail about the infrastructure of cloud computing. However, you will be exposed to many of the products you can use, and hopefully get familiar with some cloud concepts...
24/10/2016
What’s better: Amazon’s Availability Zones vs. Microsoft Azure’s regions
There are important differences in how these clouds are built. Although they both offer core IaaS features like virtual machines, storage and databases the leading public cloud providers, Amazon Web Services and Microsoft Azure, take very different approaches in offering cloud services, including at the most basic level of how their data centers are constructed and positioned around the world.
13/10/2016
To the cloud, big data sisters and brothers, to the cloud
While reports of big data's death have been greatly exaggerated, the skepticism is not unwarranted. The cloud may have some of the answers, but it won't solve all of big data's problems...
26/09/2016
Powerpoint is evil
Imagine a widely used and expensive prescription drug that promised to make us beautiful but didn't. Instead the drug had frequent, serious side effects: It induced stupidity, turned everyone into bores, wasted time, and degraded the quality and credibility of communication. These side effects would rightly lead to a worldwide product recall...
11/09/2016
Where is the Science in "Data Science"?
William Edwards Deming said: In God We Trust; All Others Must Bring Data. Here, we will explore how we “bring” data scientifically into decision-making. It is quite surprising to see many practitioners in the field apply the latest and greatest in tools and technologies to fairly large and complex datasets, and then find their results discarded by decision-makers because the science of the domain remained to be addressed ...
05/09/2016
Five Questions about Data Science
Recently, we were able to ask five questions of Murtaza Haider, about the new book from IBM Press called “Getting Started with Data Science: Making Sense of Data with Analytics.” Below, the author talks about the benefits of data science in today’s professional world...
15/08/2016
Open Source Winning Against Proprietary Data Science Vendors
With the recent publication of Gartner’s Magic Quadrant for Advanced Analytics, we wanted to know how proprietary data science software vendors were faring against open source challengers. We discovered compelling evidence that open source tools have had a dramatic impact on SAS, IBM, Microsoft and others...
09/08/2016
Magic Quadrant for Cloud Infrastructure as a Service, Worldwide (Gartner)
The market for cloud IaaS has consolidated significantly around two leading service providers. The future of other service providers is increasingly uncertain and customers must carefully manage provider-related risks...
29/07/2016
Top Programming Languages Trends: The Rise of Big Data
(...) Another language that has continued to move up the rankings since 2014 is R, now in fifth place. R has been lifted in our rankings by racking up more questions onStack Overflow—about 46 percent more since 2014. But even more important to R’s rise is that it is increasingly mentioned in scholarly research papers (...)
19/07/2016
A comprehensive comparison of Jupyter vs. Zeppelin
No choice is not good. But life could be complicated with too many choices (especially when we have no idea how to make decision). As a lifelong student of data science and technology in general, I usually run into challenges of what tool to use and fall in love with. That's why I'm writing this post to help learners like myself. I'm not going to talk about the commercial technologies (such as Adatao) and only focus on open source alternatives. Why? I like free stuffs...
14/07/2016
How to make sure your Hadoop data lake doesn't become a swamp
The term "data lake" has been popular for a few years now, particularly in the context of Hadoop-based systems for large-scale data processing. But as Constellation Research VP and principal analyst Doug Henschen notes in an in-depth new report, it's no simple task to create a data lake that lives up to the concept's potential ...
21/03/2016
The Future of Data Warehousing (by Cloudera)
Traditional data warehouse ETL has become too slow, too complicated, and too expensive to address the torrent of new data sources and new analytic approaches needed for decision making. The new ETL environment is already looking drastically different...
08/03/2016
How Time-to-Insight Is Driving Big Data Business Investment
They say that time is money, but Fortune 1000 executives polled in the fourth annual Big Data Executive Survey conducted by NewVantage Partners have boldly confirmed that reducing time-to-insight rather than saving money is the primary driver for their Big Data business investment...
02/03/2016
The Role of Big Data in the Travel Industry in 2016
There isn’t an industry that big data hasn’t touched. In the modern digital age, big data is impacting everything – including travel. And while the role of big data may be more obvious and transparent in other industries, it’s clear that the future of travel and big data will be permanently intertwined for the better ...
23/02/2016
Top 8 Big Data trends for 2016 9 (by Tableau)
This past year was an important one for Big Data. We saw more businesses accepting that data, in all forms and sizes, is critical for the best possible decision-making. In support of this, we’ll continue to see the systems that support non-relational or unstructured forms of data, as well as massive data volumes, evolve and mature to operate well inside of Enterprise IT systems. This will enable business users, along with data scientists, to fully realize and unlock the value in big data...
08/02/2016
Apache Spark rises to become most active open source project in Big Data
Adoption interest in Spark has topped MapReduce, says a new survey. What's supporting interest is the need for speed, boosting agility, and revenues...
04/02/2016
Magic Quadrant for Business Intelligence and Analytics Platforms
The BI and analytics platform market's multiyear shift from IT-led enterprise reporting to business-led self-service analytics has passed the tipping point. Most new buying is of modern, business-user-centric platforms forcing a new market ...
◄
1 / 1
►