mastodontech.de ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
Offen für alle (über 16) und bereitgestellt von Markus'Blog

Serverstatistik:

1,5 Tsd.
aktive Profile

#syntheticdata

0 Beiträge0 Beteiligte0 Beiträge heute

Synthetic data—realistic yet artificial—helps organisations overcome data shortages, privacy risks and compliance challenges. It enables safer AI model training, testing edge cases, and simulating new markets, but should complement, not fully replace, real data. #SyntheticData #AI #Innovation #DataScience #Privacy levelact.com/how-synthetic-dat

LevelAct · How Synthetic Data is Powering the Next Wave of AI and InnovationEnterprises are generating more data than ever, yet many are still data-starved when it comes to fueling next-gen applications, training

Can AI Be Trained on Data Generated by Other AI? Exploring the Potential and Pitfalls of Synthetic Training Data
AI-generated training data is revolutionizing AI model training! Synthetic data simulates real-world scenarios, offering a more efficient approach. Companies like Anthropic are already using it. Learn more about this exciting new frontier! #SyntheticData #AIGeneration #AItraining #DataScience #MachineLearning #FutureofAI
tech-champion.com/data-science...

Fortgeführter Thread

Synthetic data generation with GPT-4o was a game changer for us. By creating datasets with common misspellings and syntactic variations, we were able to enhance the robustness of our search models significantly. This crucial step ensured that our AI models could handle a variety of real-world inputs seamlessly. #SyntheticData #Innovation

Antwortete im Thread

In this way, we _can_ leverage sensitive data in research in an easy and robust way.

But even this less ambitious goal is difficult to put into practice. So how do we democratize research with sensitive data? Make #SyntheticData more #Accessible!

#DataScience community, let's focus on creating user-friendly software, teaching our colleagues, and — perhaps most importantly — try to put this stuff in real-world practice.

Image below adapted from the Center for Open Science

3/n

Fortgeführter Thread

In the short paper, I argue that we are moving towards the fundamental barrier of the privacy-fidelity trade-off in #SyntheticData: the more it looks like the real data, the more privacy risk it incurs.

Is it all hopeless then? Nope! We can use currently available methods to generate synthetic data on the very private end of the spectrum, and use that to make it easy to write code, try out preliminary analyses, and create #Reproducible science.

2/n

📢 New #OpenAccess paper 📢

#SyntheticData has (had) a lot of hype. Its promise: replace sensitive real data with #Privacy friendly synthetic data and train all your #AI on that instead. However, despite much effort, we don't see it successfully used in practice a lot.

It's a shame, because beyond the hype there is actually a lot of value in synth data as a privacy enhancing technology, with a more humble promise: a tool to reduce access barriers for sensitive data

cell.com/patterns/fulltext/S26

1/n