cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MattM
by New Contributor III
  • 2714 Views
  • 0 replies
  • 0 kudos

Unstructured Data - PDF and a semi-structured data

I have a scenario where one source is unstructered pdf files and another source is semi-structered JSON files. I get files from these two sources on a daily basis into an ADLS storage. What is the best way to load this into a medallion structure by s...

  • 2714 Views
  • 0 replies
  • 0 kudos
Antoine_De_A
by New Contributor III
  • 3443 Views
  • 1 replies
  • 3 kudos

Resolved! Streaming data to CosmosDB

Hello everyone,Here is the problem I am facing. I'm currently working on streaming data to DataBricks, my goal is to create a data stream on a first notebook, and then on a second notebook to read this data stream, add all the new rows to a dataFrame...

  • 3443 Views
  • 1 replies
  • 3 kudos
Latest Reply
Antoine_De_A
New Contributor III
  • 3 kudos

Problem solved!Instead of trying to do everything directly with the .writeStream options I used the .forEachBatch() function which allows me to call a function outside the .writeStream().In this function I get a dataFrame in parameter which is my str...

  • 3 kudos
curious-case-of
by New Contributor II
  • 11607 Views
  • 1 replies
  • 4 kudos

Databricks notebook taking too long to run as a job compared to when triggered from within the notebook

I don't know if this question has been covered earlier, but here it goes - I have a notebook that I can run manually using the 'Run' button in the notebook or as a job.The runtime when I run from within the notebook directly is roughly 2 hours. But w...

  • 11607 Views
  • 1 replies
  • 4 kudos
Latest Reply
wvl
New Contributor II
  • 4 kudos

We're seeing the same behavior.. Good performance using interactive cluster.Using identically sized job cluster, performance is bad. Any ideas?

  • 4 kudos
data_engineer_0
by New Contributor II
  • 15497 Views
  • 3 replies
  • 2 kudos

How to run the .py file in databricks cluster

Hi team,I wants to run the below command in databricks and also need to capture the error and success message.Please help me out here,Thanks in advanceEx: python3 /mnt/users/code/x.py --arguments

  • 15497 Views
  • 3 replies
  • 2 kudos
Latest Reply
User16764241763
Honored Contributor
  • 2 kudos

Hello @Piper Wilson​ Would this task not help?https://docs.databricks.com/dev-tools/api/latest/examples.html#jobs-api-examples

  • 2 kudos
2 More Replies
User15787040559
by Databricks Employee
  • 3353 Views
  • 1 replies
  • 0 kudos

MicrosoftTeams-image

ERROR Max retries exceeded with url: /api/2.0/jobs/runs/get?run_id= Failed to establish a new connectionThis error can happen when exceeding the rate limits for all REST API calls as documented here.In the image shown for example we're using the Jobs...

  • 3353 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16764241763
Honored Contributor
  • 0 kudos

Hi @Carlos Morillo​  Are you facing this issue consistently or when you run a lot of jobs?We are internally tracking a similar issue. Could you please file a support request with Microsoft Support? Databricks and MSFT will collaborate and provide upd...

  • 0 kudos
chandan_a_v
by Valued Contributor
  • 26524 Views
  • 7 replies
  • 3 kudos
  • 26524 Views
  • 7 replies
  • 3 kudos
Latest Reply
Prabakar
Databricks Employee
  • 3 kudos

By any chance, was the cluster restarted after installing the libraries or was it detached and reattached from/to the notebook? Notebook-scoped libraries do not persist across sessions. You must reinstall notebook-scoped libraries at the beginning of...

  • 3 kudos
6 More Replies
Gopal_Sir
by New Contributor III
  • 38301 Views
  • 5 replies
  • 7 kudos

Resolved! How to convert a string column to Array of Struct ?

I have a nested struct , where on of the field is a string , it looks something like this ....string = "[{\"to_loc\":\"6183\",\"to_loc_type\":\"S\",\"qty_allocated\":\"18\"},{\"to_loc\":\"6137\",\"to_loc_type\":\"S\",\"qty_allocated\":\"9\"},{\"to_lo...

  • 38301 Views
  • 5 replies
  • 7 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 7 kudos

Can you mark the question as answered so others can find the solution?

  • 7 kudos
4 More Replies
kerala_tourism
by New Contributor
  • 739 Views
  • 0 replies
  • 0 kudos

Tourism attractions in kerala are described here. Kerala has a rich tourism background, which contributes much to the economy. Tourism is the way of i...

Tourism attractions in kerala are described here. Kerala has a rich tourism background, which contributes much to the economy. Tourism is the way of income for a large number of people in Kerala. National parks, wild life sanctuaries, etc. are the ma...

  • 739 Views
  • 0 replies
  • 0 kudos
LorenzoRovere
by New Contributor II
  • 2202 Views
  • 2 replies
  • 0 kudos

Hi all,My organization has changed our domain emails and now all databricks users can't login.We can only login into azure portal with our new dom...

Hi all,My organization has changed our domain emails and now all databricks users can't login.We can only login into azure portal with our new domain email.The message is the following (using the new domain)I wonder if there is a way to upload all us...

2022_06_08_14_41_25_Login_Databricks
  • 2202 Views
  • 2 replies
  • 0 kudos
Latest Reply
LorenzoRovere
New Contributor II
  • 0 kudos

Hi @Prabakar Ammeappin​ thanks for your response. I wanted to know if the domain name change is transparent within the same workspace. We don't need to migrate data, only replace old domain with new domain. Do you think this is possible?

  • 0 kudos
1 More Replies
Sunny
by New Contributor III
  • 13313 Views
  • 1 replies
  • 1 kudos

Resolved! Maximum duration of the Databricks job before it times out

May I know the duration (max) a job is allowed to run if Timeout is not sethttps://docs.databricks.com/data-engineering/jobs/jobs.html

  • 13313 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sivaprasad1
Valued Contributor II
  • 1 kudos

This is part of the configuration of the task itself, so if no timeout is specified, it can theoretically run forever (e.g. streaming use case). Please refer timeout section in below link.https://docs.databricks.com/dev-tools/api/latest/jobs.html#ope...

  • 1 kudos
mihai
by New Contributor III
  • 8809 Views
  • 7 replies
  • 31 kudos

Resolved! Workspace deployment on AWS - CloudFormation Issue

Hello,I have been trying to deploy a workspace on AWS using the quickstart feature, and I have been running into a problem where the stack fails when trying to create a resource.The following resource(s) failed to create: [CopyZips].From the CloudWat...

  • 8809 Views
  • 7 replies
  • 31 kudos
Latest Reply
GarethGraphy
New Contributor III
  • 31 kudos

Dropping by with my experience in case anyone lands here via Google.Note that the databricks-prod-public-cfts bucket is located in us-west-2.If your AWS organisation has an SCP which whitelists specific regions (such as this example) and us-west-2 is...

  • 31 kudos
6 More Replies
Shay
by New Contributor III
  • 8276 Views
  • 8 replies
  • 6 kudos

Resolved! How do you Upload TXT and CSV files into Shared Workspace in Databricks?

I try to upload the needed files under the right directory of the project to work.The files are zipped first as that is an accepted format. I have a Python project which requires the TXT and CSV format files as they are called and used via .py files ...

  • 8276 Views
  • 8 replies
  • 6 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 6 kudos

@Shay Alam​, can you share the code with which you read the files? Apparently python interprets the file format as a language, so it seems like some options are not filled in correctly.

  • 6 kudos
7 More Replies
PJ
by New Contributor III
  • 4263 Views
  • 8 replies
  • 1 kudos

Please bring back notebook names in google chrome tabs. This feature seemed to have disappeared within the last 24 hours. Now, each tab just reads &...

Please bring back notebook names in google chrome tabs. This feature seemed to have disappeared within the last 24 hours. Now, each tab just reads "Databricks" at the top. I often have multiple databricks scripts open at the same time and it is re...

  • 4263 Views
  • 8 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

The fix has been pushed to all regions during their release maintenance window. So if your workspace is deployed with the new release, then you should be able to see the notebook names in the browser tab.

  • 1 kudos
7 More Replies
zesdatascience
by New Contributor III
  • 4420 Views
  • 2 replies
  • 2 kudos

Resolved! Delta Live Tables with CDC and Database Views with Lower Case Names

Hi,I am testing out creating some Delta Live Tables using Change Data Capture and having an issue where the resulting views that are created have lower case column names. Here is my function I am using to ingest data:def raw_to_ods_merge(table_name,s...

  • 4420 Views
  • 2 replies
  • 2 kudos
Latest Reply
zesdatascience
New Contributor III
  • 2 kudos

Hi @Kaniz Fatma​ Not found a solution just yet, but not a priority as most users will be accessing through Databricks SQL, so no further assistance required right now.Thanks

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels