cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Constantine
by Contributor III
  • 9444 Views
  • 4 replies
  • 4 kudos

Resolved! How does Spark do lazy evaluation?

For context, I am running Spark on databricks platform and using Delta Tables (s3). Let's assume we a table called table_one. I create a view called view_one using the table and then call view_one. Next, I create another view, called view_two based o...

  • 9444 Views
  • 4 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @John Constantine​ ,The following notebook url will help you to undertand better the difference between lazy transformations and action in Spark. You will be able to compare the physical query plans and undertand better what is going on when you e...

  • 4 kudos
3 More Replies
RantoB
by Valued Contributor
  • 2248 Views
  • 2 replies
  • 4 kudos

Resolved! Import a notebook in a Release Pipeline with a Python script

Hi, I would like to import a python file to Databricks with a Azure DevOps Release Pipeline.Within the pipeline I execute a python script which contains this code :import sys import os import base64 import requests   dbw_url = sys.argv[1] # https://a...

  • 2248 Views
  • 2 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

Recently I wrote about alternative way to export/import notebooks in pthon https://community.databricks.com/s/question/0D53f00001TgT52CAF/import-notebook-with-python-script-using-api This way you will get more readable error/message (often it is rela...

  • 4 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 2150 Views
  • 2 replies
  • 13 kudos

Resolved! something like AWS Macie to perform scans on Azure Data Lake

Does anyone know alternative for AWS Macie in Azure?AWS Macie scan S3 buckets for files with sensitive data (personal address, credit card etc...).I would like to use the same style ready scanner for Azure Data Lake.

  • 2150 Views
  • 2 replies
  • 13 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 13 kudos

thank you, I checked and yes it is definitely the way to go

  • 13 kudos
1 More Replies
ahana
by New Contributor III
  • 12904 Views
  • 11 replies
  • 2 kudos
  • 12904 Views
  • 11 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @ahana ahana​ ,Did any of the replies helped you solve this issue? would you be happy to mark their answer as best so that others can quickly find the solution?Thank you

  • 2 kudos
10 More Replies
Chris_Shehu
by Valued Contributor III
  • 1697 Views
  • 2 replies
  • 2 kudos
  • 1697 Views
  • 2 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

Hi @Christopher Shehu​ , if @Piper Wilson​ 's response helped you to solve your question? would you be happy to mark her answer as best so that others can quickly find the solution in the future.

  • 2 kudos
1 More Replies
Orianh
by Valued Contributor II
  • 7529 Views
  • 4 replies
  • 2 kudos

Resolved! Read JSON with backslash.

Hello guys.I'm trying to read JSON file which contains backslash and failed to read it via pyspark.Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as te...

  • 7529 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@orian hindi​ - Would you be happy to post the solution you came up with and then mark it as best? That will help other members.

  • 2 kudos
3 More Replies
dataEngineer3
by New Contributor II
  • 4601 Views
  • 8 replies
  • 0 kudos

Hi All, I am trying to read a csv file from datalake and loading data into sql table using Copyinto. am facing an issue   Here i created one table wit...

Hi All,I am trying to read a csv file from datalake and loading data into sql table using Copyinto.am facing an issue  Here i created one table with 6 columns same as data in csv file.but unable to load the data.can anyone helpme on this

image
  • 4601 Views
  • 8 replies
  • 0 kudos
Latest Reply
dataEngineer3
New Contributor II
  • 0 kudos

Thanks Werners for your Reply,How to pass schema(ColumnName && Types) to CSV file ??

  • 0 kudos
7 More Replies
kjoth
by Contributor II
  • 6701 Views
  • 4 replies
  • 3 kudos

Resolved! Pyspark logging - custom to Azure blob mount directory

I'm using the logging module to log the events from the job, but it seems the log is creating the file with only 1 lines. The consecutive log events are not being recorded. Is there any reference for custom logging in Databricks.

  • 6701 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@karthick J​ - If Jose's answer helped solve the issue, would you be happy to mark their answer as best so that others can find the solution more easily?

  • 3 kudos
3 More Replies
AjayHN
by New Contributor II
  • 3295 Views
  • 1 replies
  • 2 kudos

Resolved! Notebook failing in job-cluster but runs fine in all-purpose-cluster with the same configuration

I have a notebook with many join and few persist operations (which runs fine on all-purpose-cluster (with worker nodes - i3.xlarge and autoscale enabled), but the same notebook failing in job-cluster with the same cluster definition (to be frank the ...

job-cluster all-purpose-cluster
  • 3295 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Ajay Nanjundappa​ ,Check "Event log" tab. Search for any spot terminations events. It seems like all your nodes are spot instances. The error "FetchFailedException" is associated with spot termination nodes.

  • 2 kudos
Andriy_Shevchen
by New Contributor
  • 2885 Views
  • 2 replies
  • 3 kudos

Resolved! yarn.nodemanager.resource.memory-mb parameter update

I am currently working on determining proper cluster size for my Spark application and I have a question regarding Hadoop configuration parameter yarn.nodemanager.resource.memory-mb. From what I see, this parameter is responsible for setting the phys...

  • 2885 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @Andriy Shevchenko​ ,Databricks does not use Yarn. I recommend you to try to use Databricks community edition link to get familiar and explore. You can check Ganglia UI to see how is the cluster utilization, memory, cpu, IO, etc

  • 3 kudos
1 More Replies
Sebastian
by Contributor
  • 7858 Views
  • 3 replies
  • 1 kudos

How to access databricks secret in global ini file

How to access databricks secret in global ini file. {{secrets/scope/key}} doesnt work. Do i have to put that inside quotes

  • 7858 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

hi @SEBIN THOMAS​ ,I would like to share the docs here are you getting any error messages? like @Hubert Dudek​ mentioned, please share more details and error message in case you are getting any.

  • 1 kudos
2 More Replies
Mohit_m
by Valued Contributor II
  • 2248 Views
  • 5 replies
  • 2 kudos

Which rest API to use in order to list the groups that belong to a specific user

Which rest API to use in order to list the groups that belong to a specific user

  • 2248 Views
  • 5 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

@Mohit Miglani​ ,Make sure to select the best option so the post will be moved to the top and will help in case more users have this question in the future.

  • 2 kudos
4 More Replies
Nosa
by New Contributor II
  • 1885 Views
  • 3 replies
  • 4 kudos

Resolved! adding databricks to my application

I am developing an application. I want to use databricks in my application. I developed that with python and godot. how can I have data bricks in my application?

  • 1885 Views
  • 3 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Ensiyeh Shojaei​ ,Which cloud service are you using? According to the cloud provider, you will have a list of tools that can help you to connect and interact in your application.

  • 4 kudos
2 More Replies
schmit89
by New Contributor
  • 2976 Views
  • 1 replies
  • 1 kudos

Resolved! Downstream duration timeout

I'm trying to upload a file that is .5GB for a school lab and when I drag the file to DBFS it uploads for about 30 seconds and then I receive a downstream duration timeout error. What can I do to solve this issue?

  • 2976 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Jason Schmit​ ,Your file might be too large to upload by using the upload interface docs I will recommend to split it up into smaller files. You can also use DBFS CLI, dbutils to upload your file.

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels