The new Databricks jobs matrix is awesome! but looking at it can be addictiveÂ
The new Databricks jobs matrix is awesome!but looking at it can be addictive
- 1471 Views
- 0 replies
- 24 kudos
The new Databricks jobs matrix is awesome!but looking at it can be addictive
I'm trying to read a file from a Google Cloud Storage bucket. The filename starts with a period, so Spark assumes the file is hidden and won't let me read it.My code is similar to this:from pyspark.sql import SparkSession spark = SparkSession.build...
I don't think there is an easy way to do this. You will also break very basic functionality (like being able to read Delta tables) if you were able to get around these constraints. I suggest you employ a rename job and then read.
I want the code based on attached commands filesource file is attached jsonTables are in Tabs in excel sheetkindly give me the code for 3-5 tables for my understanding
Hi everyone, I am having some troubles to add a deduplication step on a file streaming that is already running. The code I am trying to add is this one:df = df.withWatermark("arrival_time", "20 minutes")\ .dropDuplicates(["event_id", "arrival_time"])...
Hi,We have a scenario where we need to deploy 15 spark streaming applications on databricks reading from kafka to single Job cluster. We tried following approach:1. create job 1 with new job cluster (C1)2. create job2 pointing to C1...3. create job15...
@Hubert Dudek​ , thanks a lot for responding. When we have setup like this, if one tasks fails, it will not terminate the entire job right?Since, the job is continously running as it is streaming app, is it possible to add new task to the job(while i...
Hi, Since yesterday, without a known reason, some commands that used to run daily are now stuck in a "Running command" state. Commands as:dataframe.toPandas() dataframe.show(n=1) dataframe.description() dataframe.write.format("csv").save(location) ge...
Hi @Luiz Carneiro​ ,Could you split your Spark's actions into more CMD (paragraphs) and run one at a time to check where it could be taking the extra time. Also, Pandas only runs on your driver. Have you try to use Python or Scala APIs instead? in ca...
Hi!I have many questions about Spark Streaming and Evnethub。Can you help me?Q1:How to improve Spark Streaming writer Input Rate and Processing rate?I connect Azure Eventhubs using Spark Streaming(Azure Databricks), but I found if I use display, this ...
setMaxEventsPerTrigger not equal to numInputRow is my problem
How to execute matplotlib animations in a Databricks notebook?
Please refer to the below example code and , use displayHTML(ani.to_jshtml()) to execute matplotlib animations in a databricks notebookimport matplotlib.pyplot as plt import matplotlib.animation import numpy as np t = np.linspace(0,2*np.pi) x = np.si...
Hey Guys, Is there a way to open pyodbc read only connection with simba spark driver? At the moment, I'm able to execute queries such as select , delete, insert into - basically every sql statement using pyodbc. I tried to open pyodbc connection but ...
This readonly=True is working only on some drivers. Just create additional users with granted read-only permission.
Hi,we have configured our infrastructure by terraform in AZURE, now we want to config GitLab integration with databriks to automate notebook and job deployment. I sow that now this step is available only via databricks UI interface, can you share som...
Actually Repos API is already available https://docs.databricks.com/dev-tools/api/latest/repos.html#operation/create-repo
On a current running all-purpose-cluster, enter the spark UI, then SQL, and into a task, you can see the details, and SQL properties but the visualization doesn't appear, that graph is very useful to debug certain scenarios.It works fine on jobs.Any ...
I'm on Chrome, sometimes appears sometimes not, will look more into that to give something reproducible. Thanks!
Accounts added after we turned on SSO don't allow me to restrict their cluster creation abilities. How can I undo this, so I can prevent business people from writing to ETLed data?
Nevermind. Turns out someone was giving everyone admin privileges when they weren't supposed to and I didn't notice.
Hi, I am using databricks to load data from one delta table into another delta table. I'm using SIMBA Spark JDBC connector to pull data from delta table in my source instance and writing into delta table in my databricks instance. The source has...
Hi @govind@dqlabs.ai​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!
Hi All, I have been having an issue identifying how to do a uniqueness check for the quality check. Below is an example. @dlt.expect("origin_not_dup", "origin is distinct from origin") def harmonized_data(): df=dlt.read("raw_data") for col in...
How can we update the OpenSSL version for the cluster to address this vulnerability ?https://ubuntu.com/security/CVE-2022-0778Tried with this global init script to auto update the openssl version but does not seem to work as apt-utils is missing. apt...
I can see below from our internal communication. CVSSv3 score: 4.0 (Medium) AV:N/AC:H/PR:N/UI:N/S:C/C:N/I:N/A:LReference: https://www.openssl.org/news/secadv/20220315.txtSeverity: HighThe BN_mod_sqrt() function, which computes a modular square root, ...
| User | Count |
|---|---|
| 1644 | |
| 793 | |
| 565 | |
| 349 | |
| 287 |