cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ManojkMohan
by Contributor III
  • 55 Views
  • 3 replies
  • 2 kudos

Ingesting 100 TB raw CSV data into the Bronze layer in Parquet + Snappy

Problem i am trying to solve:Bronze is the landing zone for immutable, raw data.At this stage, i am trying to sse a columnar format (Parquet or ORC) → good compression, efficient scans. and then apply lightweight compression (e.g., Snappy) → balances...

  • 55 Views
  • 3 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @ManojkMohan ,To jump in to conversion, is there any particular reason why you don't want to load that csv to Delta format? Delta has multiple advantages over reqular parquet.Things like file skipping, predicate pushdown filtering are much more pe...

  • 2 kudos
2 More Replies
Sainath368
by New Contributor III
  • 19 Views
  • 1 replies
  • 0 kudos

Is Photon Acceleration Helpful for All Maintenance Tasks (OPTIMIZE, VACUUM, ANALYZE_COMPUTE_STATS)?

Hi everyone,We’re currently reviewing the performance impact of enabling Photon acceleration on our Databricks jobs, particularly those involving table maintenance tasks. Our job includes three main operations: OPTIMIZE, VACUUM, and ANALYZE_COMPUTE_S...

  • 19 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Sainath368 ,I wouldn't use photon for this kind of task. You should use it primarly for ETL transformations where it shines.VACUUM and OPTIMIZE are more of maintenance tasks and using photon would be pricey overkill here.According to documentatio...

  • 0 kudos
merca
by Valued Contributor II
  • 11946 Views
  • 13 replies
  • 7 kudos

Value array {{QUERY_RESULT_ROWS}} in Databricks SQL alerts custom template

Please include in documentation an example how to incorporate the `QUERY_RESULT_ROWS` variable in the custom template.

  • 11946 Views
  • 13 replies
  • 7 kudos
Latest Reply
CJK053000
New Contributor III
  • 7 kudos

Databricks confirmed this was an issue on their end and it should be resolved now. It is working for me.

  • 7 kudos
12 More Replies
felix4572
by Visitor
  • 72 Views
  • 5 replies
  • 1 kudos

transformWithStateInPandas throws "Spark connect directory is not ready" error

Hello,we employ arbitrary stateful aggregations in our data processing streams on Azure Databricks, and would like to migrate from applyInPandasWithState to transformWithStateInPandas. We employ the Python API throughout our solution, and some of our...

felix4572_0-1756710186921.png
  • 72 Views
  • 5 replies
  • 1 kudos
Latest Reply
felix4572
Visitor
  • 1 kudos

Dear @szymon_dybczak and @-werners- ,thank you a lot for for your responses and references! @-werners- , thank you for the link to the announcement article. The availability section lists that "No-Isolation and Unity Catalog Dedicated Clusters" are s...

  • 1 kudos
4 More Replies
dbdev
by New Contributor II
  • 529 Views
  • 8 replies
  • 3 kudos

Maven libraries in VNet injected, UC enabled workspace on Standard Access Mode Cluster

Hi!As the title suggests, I want to install Maven libaries on my cluster with access mode 'Standard'. Our workspace is VNet injected and has Unity Catalog enabled.The coordinates have been allowlisted by the account team according to these instructio...

dbdev_1-1756137297433.png dbdev_2-1756137354610.png dbdev_3-1756137433510.png
  • 529 Views
  • 8 replies
  • 3 kudos
Latest Reply
dbdev
New Contributor II
  • 3 kudos

@nayan_wylde @szymon_dybczak I just tried using a JAR I uploaded to an allowlisted Volume (ojdbc8 of oracle) and I get the same error. it seems like I'm able to install an JAR, but once it's installed my cluster is broken.

  • 3 kudos
7 More Replies
Vamsi_S
by Visitor
  • 34 Views
  • 1 replies
  • 0 kudos

Ingest data from SQL Server

I've been working on data ingestion from SQL Server to UC using lakeflow connect. Lakeflow connect actually made the work easier when everything is right. I am trying to incorporate this with DAB and this would work fine with schema and table tags fo...

  • 34 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 0 kudos

Hello @Vamsi_S Good day! Did you Preprocessing Table Names in CI/CD and Generate YAML Dynamically (Recommended for Dynamic, Automated Ingestion)Did you contact your databricks account manager (incase if you working with a company) for feature request...

  • 0 kudos
shan-databricks
by New Contributor III
  • 42 Views
  • 2 replies
  • 2 kudos

How to load all the previous day's data only into the newly added column of the existing delta table

How to load all the previous day's data only into the newly added column of the existing delta table? Is there any option available to do that without writing any logic?

  • 42 Views
  • 2 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Honored Contributor III
  • 2 kudos

@shan-databricks there's certainly ways for the schema to evolve within your delta tables that's supported out of the box: https://docs.databricks.com/aws/en/delta/update-schema#enable-schema-evolution To update older records, they'd likely have NULL...

  • 2 kudos
1 More Replies
Phani1
by Valued Contributor II
  • 33 Views
  • 2 replies
  • 1 kudos

cosmosdb metadata integration with unity catalog

Hi Team,How can we integrate Cosmos DB Meta data with Unity Catalog, can you please provide some insights on this?Regards,Phani

  • 33 Views
  • 2 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 1 kudos

Hello @Phani1 Good day:I have found a whole document on your requirementshttps://community.databricks.com/t5/technical-blog/optimising-data-integration-and-serving-patterns-with-cosmos-db/ba-p/91977 It has a project with it as well. 

  • 1 kudos
1 More Replies
Datalight
by New Contributor II
  • 39 Views
  • 1 replies
  • 0 kudos

Resolved! How to build Data Pipeline to consume data from Adobe Campaign to Azure Databricks

May Techie please help me design the pipeline with Databricks.I don't have any control over Adobe.How to set up a data pipeline that moves csv files from Adobe to ADLS Gen2 via a cron job, using Databricks.where this cron job will execute ? how ADLS ...

  • 39 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 0 kudos

Hello @Datalight Good day!Can I please know what do you mean by "you dont have any control over adobe"? I found similar case study over here: https://learn.microsoft.com/en-us/answers/questions/5533633/data-pipeline-to-push-files-from-external-system...

  • 0 kudos
SangNguyen
by New Contributor
  • 485 Views
  • 8 replies
  • 3 kudos

Cannot deploy DAB with the Job branch using a feature branch in Workspace UI

Hi, I tried to deploy DAB on Workspace UI with a feature branch (sf-trans-seq) targeted to Dev. After deploying successfully, the Job branch is, however, using the master branch (see the screenshot below).Is there any option to force the Job branch t...

Issue - DAB Deployment on Workspace UI.png SangNguyen_0-1756138153447.png
  • 485 Views
  • 8 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

I agree.Can you mark your (or someone else´s) answer as solved?  Because I think you won´t be the only one with this issue/feature.

  • 3 kudos
7 More Replies
xavier_db
by New Contributor
  • 35 Views
  • 1 replies
  • 0 kudos

Mongodb connection in GCP Databricks

I am trying to connect with Mongodb from databricks which is UC enabled, and both the mongodb and databricks are in same VPC, I am using the below code, df = ( spark.read.format("mongodb") .option( "connection.uri", f'''mongodb://{username}:{password...

  • 35 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @xavier_db ,Standard access mode has more limitations compared to dedicate access mode. For example, look at the limitations list of standard access mode:Standard compute requirements and limitations | Databricks on AWSNow, compare it to dedicated...

  • 0 kudos
fix_databricks
by New Contributor II
  • 3459 Views
  • 3 replies
  • 0 kudos

Cannot run another notebook from same directory

Hello, I am having a similar problem from this thread which was never resolved: https://community.databricks.com/t5/data-engineering/unexpected-error-while-calling-notebook-string-matching-regex-w/td-p/18691 I renamed a notebook (utility_data_wrangli...

  • 3459 Views
  • 3 replies
  • 0 kudos
Latest Reply
ddundovic
New Contributor III
  • 0 kudos

I am running into the same issue. It seems like the `%run` magic command is trying to parse the entire cell content as its arguments. So if you have%run "my_notebook" print("hello")in the same cell, you will get the following error: `Failed to parse...

  • 0 kudos
2 More Replies
Anubhav2011
by New Contributor
  • 284 Views
  • 2 replies
  • 1 kudos

What is the Power of DLT Pipeline to read streaming data

I am getting thousands of records every second in my bronze table from Qlik and every second the bronze table is getting truncated and load with new data by Qlik itself. How do I process this much data every second to my silver streaming table before...

  • 284 Views
  • 2 replies
  • 1 kudos
Latest Reply
ManojkMohan
Contributor III
  • 1 kudos

Core ProblemBronze table is not append-only, but truncate + insert every second.DLT (Delta Live Tables) in continuous mode assumes append-only streaming sources (like Kafka).Because Qlik wipes and replaces data every second, DLT cannot guarantee no d...

  • 1 kudos
1 More Replies
ScottH
by New Contributor
  • 158 Views
  • 1 replies
  • 0 kudos

Installing Marketplace Listing via Python SDK...

I am trying to use the Databricks Python SDK to install a Databricks Marketplace listing to Unity Catalog. I am getting stuck on how to provide a valid consumer terms version when passing the "accepted_consumer_terms" parameter to the w.consumer_inst...

ScottH_0-1756486791644.png
  • 158 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @ScottH ,It took me about 2 hours to make it right, but here it is. You need to provide valid date. And you can ask, from where that date is coming from? It's coming from consumer listing: listings = w.consumer_listings.get(id= 'e913bea3-9a37-446c...

  • 0 kudos
billfoster
by New Contributor II
  • 24521 Views
  • 10 replies
  • 7 kudos

how can I learn DataBricks

I am currently enrolled in data engineering boot camp. We go over various technologies azure , pyspark , airflow , Hadoop ,nosql,SQL, python. But not over something like databricks. I am in contact with lots of recent graduates who landed a job. Almo...

  • 24521 Views
  • 10 replies
  • 7 kudos
Latest Reply
mosinjack
New Contributor
  • 7 kudos

I’m also learning Databricks and totally get what you mean—it can feel like a lot to take in at first, but breaking it down into smaller steps makes it manageable. For me, the official docs and community edition were a solid starting point, but I fou...

  • 7 kudos
9 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels