Last updated: August 31, 2024 at 03:37 PM
Query: Data Catalog
Azure Purview
- Description: Azure Purview is recommended for a Windows shop as it allows scanning of on-premises SQL servers and PowerBI. It is easy to set up and not expensive to try out.
- Pros:
- Easy to set up
- Cost-effective for trying out
- Cons:
- Lack of clarity on handling ETL processes for complete lineage
Informatica's Data Catalog
- Description: Informatica's Data Catalog is recommended if budget is not a constraint.
- Pros:
- High-quality features
- Cons:
- Higher cost compared to other options
Secoda.io
- Description: Secoda.io is a relatively inexpensive option with better features than major players. It offers automated lineage between PowerBI and SQL Server.
- Pros:
- Automated lineage
- Cost-effective
- Cons:
- Specific use case for PowerBI and SQL Server lineage
Open Source Data Catalogs
- Recommendation: Referral to https://github.com/opendatadiscovery/awesome-data-catalogs for inspiration and testing Open Source Data Catalogs without heavy investment.
- Pros:
- Cost-effective
- Diverse range of options to test
- Cons:
- Some catalogs may lack key features
- Varied effectiveness based on needs
Amundsen
- Description: Recommended as a free option for a data catalog.
- Pros:
- Cost-effective
- Cons:
- Limited mention of specific features
Collibra
- Description: Collibra is suggested as a data cataloging framework with good features.
- Pros:
- Effective UI with comprehensive details
- Cons:
- May be overkill for older setup
Apache Atlas
- Description: Apache Atlas is mentioned as an alternative for data cataloging.
- Pros:
- Open source
- Cons:
- Less detail provided on specific features
Excel/SharePoint
- Description: Some users opt for using Excel stored in SharePoint for cataloging data.
- Pros:
- Simple and accessible for smaller projects
- Cons:
- Overhead of manual maintenance and potential limitations
Other Mentions
- Dataportal at Airbnb is praised for its UI and functionality but is hoped to be open source soon.
- Custom-made solutions are mentioned for specific company needs.
- Mention of CKAN, Glue Data Catalog, Openmetadata, and DataHub as options depending on company size and needs.
- Informatica's Enterprise Data Catalog (EDC) and Alation are also mentioned as good options.
General Advice
- It is emphasized to consider company size, budget, and specific needs when choosing a data catalog.
- Building a catalog requires buy-in from IT, business units, and management for success.
- Tailored solutions may be more effective than generic software if they meet specific requirements.
- Importance of thorough user research and tracking usage before committing to a data catalog solution.
This summary covers various recommendations and insights on data catalog options based on Reddit comments.