cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AsfandQ
by New Contributor III
  • 21668 Views
  • 7 replies
  • 6 kudos

Resolved! Delta tables: Cannot set default column mapping mode to "name" in Python for delta tables

Hello,I am trying to write Delta files for some CSV data. When I docsv_dataframe.write.format("delta").save("/path/to/table.delta")I get: AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourschema.Having look...

  • 21668 Views
  • 7 replies
  • 6 kudos
Latest Reply
Personal1
New Contributor II
  • 6 kudos

I still get the error when I try any method. The column names with spaces are throwing error [DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.df1.write.format("delta") \ .mo...

  • 6 kudos
6 More Replies
PB-Data
by New Contributor III
  • 3513 Views
  • 5 replies
  • 0 kudos

Web Terminal

How can I use web terminal within my azure databricks workspace if the workspace is provisioned with a private end points i.e. Allow Public Network Access is disabled.I have tried accessing web terminal from Apps tab and Bottom panel of a notebook.Th...

  • 3513 Views
  • 5 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Gotcha. See whether this link is helpful. https://learn.microsoft.com/en-us/azure/databricks/connect/storage/tutorial-azure-storage#grant-your-azure-databricks-workspace-access-to-azure-data-lake-storage-gen2

  • 0 kudos
4 More Replies
NanthakumarYoga
by New Contributor II
  • 13568 Views
  • 2 replies
  • 2 kudos

Partition in Spark

Hi Community, Need your help on understanding below topics.. I have a huge transaction file ( 20GB ) partition by transaction_date column , parquet file. I have evenly distributed data ( no skew ). There are 10 days of data and we have 10 partition f...

  • 13568 Views
  • 2 replies
  • 2 kudos
Latest Reply
Personal1
New Contributor II
  • 2 kudos

I read a .zip file in Spark and get unreadable data when I run show() on the data frame.When I check the number of partitions using df.rdd.getNumPartitions(), I get 8 (the number of cores I am using). Shouldn't the partition count be just 1 as I read...

  • 2 kudos
1 More Replies
FedericoRaimond
by New Contributor III
  • 7606 Views
  • 10 replies
  • 3 kudos

Azure Databricks Workflows with Git Integration

Hello,I receive a very weird error when attempting to connect my workflows tasks to a remote git on azure devops.As per documentation: "For a Git repository, the path relative to the repository root."Then, I use directly the name of the notebook file...

  • 7606 Views
  • 10 replies
  • 3 kudos
Latest Reply
nicole_lu_PM
Databricks Employee
  • 3 kudos

Hi Federico, The error in Error 1.png didn't look right. Since you already selected the git source for the job, you should be able to use a relative path. If you continue to run into this issue, can you please submit a support ticket if you have a Su...

  • 3 kudos
9 More Replies
databrickser
by New Contributor
  • 2081 Views
  • 2 replies
  • 0 kudos

Updating records with auto loader

I want to ingest JSON files from an S3 bucket into a Databricks table using an autoloader.A job runs every few hours to write the combined JSON data to the table.Some records might be updates to existing records, identifiable by a specific key.I want...

  • 2081 Views
  • 2 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @databrickser ,In theory, it is possible to use Auto Loader with foreachBatch to update existing records.The below code snippet show a working solution:from datetime import datetime from pyspark.sql import DataFrame from pyspark.sql.functions impo...

  • 0 kudos
1 More Replies
varonis_evgeniy
by New Contributor
  • 1049 Views
  • 2 replies
  • 0 kudos

Single task job that runs SQL notebook, can't retrieve results

Hello,We are integrating databricks and I need to run a job with single task that will run notebok with SQL query in it. I Can only use SQL warehouse and no cluster, I need to retrieve a result of the the notebook task but I can't see the results. Is...

Data Engineering
dbutils
Notebook
sql
  • 1049 Views
  • 2 replies
  • 0 kudos
Latest Reply
adriennn
Valued Contributor
  • 0 kudos

>  I need to retrieve a result of the the notebook taskIf you want to know if the task run has succeeded or not, you can enable the "lakeflow" system schema and you'll find the logs of jobs and task runs.You could then use the above info to execute a...

  • 0 kudos
1 More Replies
sungsoo
by New Contributor
  • 683 Views
  • 1 replies
  • 0 kudos

AWS Role of NACL outbound 3306 port

When using databricks for AWSI need to open 3306 to the NACL outbound port of the subnet where the endpoint is locatedI understand this is to communicate with the meta store of Databricks on the instanceAm I right to understand?If not, please let me ...

  • 683 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

You are correct, the intention of this port is to connect to the Hive metastore 

  • 0 kudos
Brad
by Contributor II
  • 5074 Views
  • 1 replies
  • 0 kudos

How databricks assign memory and cores

Hi team,We are using job cluster with node type 128G memory+16cores for a workflow. From document we know one worker is one node and is one executor. From Spark UI env tab we can see the spark.executor.memory is 24G, and from metrics we can see the m...

  • 5074 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Databricks allocates resources to executors on a node based on several factors, and it appears that your cluster configuration is using default settings since no specific Spark configurations were provided. Executor Memory Allocation: The spark.exec...

  • 0 kudos
AxelM
by New Contributor
  • 1506 Views
  • 1 replies
  • 0 kudos

Asset Bundles from Workspace for CI/CD

Hello there,I am exploring the possibilities for CI/CD from a DEV-Workspace to PROD. Besides the Notebooks (which can easily be handled by the GIT-provider) I am mainly interested in the Deployment o Jobs/Clusters/DDl...I can nowhere find a tutorial ...

  • 1506 Views
  • 1 replies
  • 0 kudos
Latest Reply
datastones
Contributor
  • 0 kudos

i think the dab mlop stack template is pretty helpful re: how to bundle, schedule and trigger custom jobshttps://docs.databricks.com/en/dev-tools/bundles/mlops-stacks.htmlyou can bundle init it locally and it should give you the skeleton of how to bu...

  • 0 kudos
balwantsingh24
by New Contributor II
  • 3672 Views
  • 3 replies
  • 0 kudos

Resolved! java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMeta

Guys please help me too solve this issue, I need it very urgent basis.

Screenshot 2024-09-27 133729.png
  • 3672 Views
  • 3 replies
  • 0 kudos
Latest Reply
saikumar246
Databricks Employee
  • 0 kudos

Hi @balwantsingh24  Internal Metastore:- Internal metastores are managed by Databricks and are typically used to store metadata about databases, tables, views, and user-defined functions (UDFs). This metadata is essential for operations like the SHOW...

  • 0 kudos
2 More Replies
Frustrated_DE
by New Contributor III
  • 2189 Views
  • 4 replies
  • 0 kudos

Delta live tables multiple .csv diff schemas

Hi all,      I have a fairly straight-forward task whereby I am looking to ingest six .csv file all with different names, schema's and blob locations into individual tables on one bronze schema. I have the files in my landing zone under different fol...

  • 2189 Views
  • 4 replies
  • 0 kudos
Latest Reply
Frustrated_DE
New Contributor III
  • 0 kudos

The code follows similar pattern below to load the different tables. import dltimport reimport pyspark.sql.functions as Flanding_zone = '/Volumes/bronze_dev/landing_zone/'source = 'addresses'@Dlt.table(comment="addresses snapshot",name="addresses")de...

  • 0 kudos
3 More Replies
Braxx
by Contributor II
  • 11859 Views
  • 4 replies
  • 3 kudos

Resolved! cluster creation - access mode option

I am a bit lazy and trying to manually recreate a cluster I have in one workspace into another one. The cluster was created some time ago. Looking at the configuration, the access mode field is "custom": When trying to create a new cluster, I do not...

Captureaa Capturebb
  • 11859 Views
  • 4 replies
  • 3 kudos
Latest Reply
khushboo20
New Contributor II
  • 3 kudos

Hi All - I am new to databricks and trying to create my first workflow. For some reason, the cluster created is of type -"custom". I have not mentioned it anywhere in my asset bundle.Due to this - I cannot create get the Unity Catalog feature. Could ...

  • 3 kudos
3 More Replies
tonyd
by New Contributor II
  • 984 Views
  • 1 replies
  • 0 kudos

Getting error "Serverless Generic Compute Cluster Not Supported For External Creators."

Getting the above mentioned error while creating serverless compute. this is the request curl --location 'https://adb.azuredatabricks.net/api/2.0/clusters/create' \--header 'Content-Type: application/json' \--header 'Authorization: ••••••' \--data '{...

  • 984 Views
  • 1 replies
  • 0 kudos
Latest Reply
saikumar246
Databricks Employee
  • 0 kudos

Hi @tonyd Thank you for reaching out to the Databricks Community. You are trying to create a Serverless Generic Compute Cluster which is not supported. You cannot create a Serverless compute Cluster. As per the below link, if you observe, there is no...

  • 0 kudos
PushkarDeole
by New Contributor III
  • 1921 Views
  • 2 replies
  • 0 kudos

Unable to set shuffle partitions on DLT pipeline

Hello,We are using a 5 worker node DLT job compute for a continuous mode streaming pipeline. The worker configuration is Standard_D4ads_v5 i.e. 4 cores so total cores across 5 workers is 20 cores.We have wide transformation at some places in the pipe...

  • 1921 Views
  • 2 replies
  • 0 kudos
Latest Reply
gchandra
Databricks Employee
  • 0 kudos

Try setting  spark.sql.shuffle.partitions to auto

  • 0 kudos
1 More Replies
ckwan48
by New Contributor III
  • 25127 Views
  • 6 replies
  • 3 kudos

Resolved! How to prevent my cluster to shut down after inactivity

Currently, I am running a cluster that is set to terminate after 60 minutes of inactivity. However, in one of my notebooks, one of the cells is still running. How can I prevent this from happening, if want my notebook to run overnight without monito...

  • 25127 Views
  • 6 replies
  • 3 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 3 kudos

If a cell is already running ( I assume it's a streaming operation), then I think it doesn't mean that the cluster is inactive. The cluster should be running if a cell is running on it.On the other hand, if you want to keep running your clusters for ...

  • 3 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels